[issue1602] windows console doesn't print or input Unicode
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Giampaolo: See #msg120700 for why that won't work, and the subsequent comments for what will work instead (basically, using WriteConsoleW and a workaround for a Windows API bug). Also see the prototype win_console.patch from Victor Stinner: #msg145963 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print or input Unicode
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Glenn wrote: So if flush checks that bit, maybe TextIOWriter could just call buffer.flush, and it would be fast if clean and slow if dirty? Yes. I'll benchmark how much overhead is added by the calls to flush; there's no point in breaking the abstraction boundary of BufferedWriter if it doesn't give a significant performance benefit. (I suspect that it might not, because Windows is very slow at scrolling a console, which might make the cost of flushing insignificant in comparison.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11395] print(s) fails on Windows with long strings
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: If I understand the bug in the Windows console functions correctly, a limit of 32767 bytes might not always be small enough. The problem is that if two or more threads are concurrently using any console functions (which all use the same 64 KiB heap), they could try to allocate up to 32767 bytes plus overhead at the same time, which will fail. I wasn't able to provoke this by writing to sys.stdout.buffer (maybe there is locking that prevents concurrent writes), but the following code that calls WriteFile directly, does provoke it. GetLastError() returns 8 (ERROR_NOT_ENOUGH_MEMORY; see http://msdn.microsoft.com/en-us/library/ms681382%28v=vs.85%29.aspx), indicating that it's the same bug. # Warning: this test may DoS your system. from threading import Thread import sys from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int from ctypes.wintypes import BOOL, HANDLE, DWORD, LPVOID, LPCVOID GetStdHandle = WINFUNCTYPE(HANDLE, DWORD)((GetStdHandle, windll.kernel32)) WriteFile = WINFUNCTYPE(BOOL, HANDLE, LPCVOID, DWORD, POINTER(DWORD), LPVOID) \ ((WriteFile, windll.kernel32)) GetLastError = WINFUNCTYPE(DWORD)((GetLastError, windll.kernel32)) STD_OUTPUT_HANDLE = DWORD(-11) INVALID_HANDLE_VALUE = DWORD(-1).value hStdout = GetStdHandle(STD_OUTPUT_HANDLE) assert hStdout is not None and hStdout != INVALID_HANDLE_VALUE L = 32760 data = b'a'*L def run(): n = DWORD(0) while True: ret = WriteFile(hStdout, data, L, byref(n), None) if ret == 0 or n.value != L: print(ret, n.value, GetLastError()) sys.exit(1) [Thread(target=run).start() for i in range(10)] -- nosy: +davidsarah ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue11395 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print or input Unicode
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: First a minor correction: The new requirement would be that a correct app also needs to flush between a sequence of buffer.writes (that end in an incomplete line, or always if PYTHONUNBUFFERED or python -u is used), and a sequence of writes. That should be and only if PYTHONUNBUFFERED or python -u is not used. I also said: If an app sets the .buffer attribute of sys.std{out,err}, it would fall back to using that buffer in the same way as when the fd is redirected. but the .buffer attribute is readonly, so this case can't occur. Glenn Linderman wrote: Would it suffice if the new scheme internally flushed after every buffer.write? It wouldn't be needed after write, because the correct application would already do one there? Yes, that would be sufficient. Am I off-base in supposing that the performance of buffer.write is expected to include a flush (because it isn't expected to be buffered)? It is expected to be line-buffered. So an app might expect that printing characters one-at-a-time will have reasonable performance. In any case, given that the buffer of the initial std{out,err} will always be a BufferedWriter object (since .buffer is readonly), it would be possible for the TextIOWriter to test a dirty flag in the BufferedWriter, in order to check efficiently whether the buffer needs flushing on each write. I've looked at the implementation complexity cost of this, and it doesn't seem too bad. A similar issue arises for stdin: to maintain strict compatibility, every read from a TextIOWrapper attached to an input console would have to drain the buffer of its buffer object, in case the app has read from it. This is a bit tricky because the bytes drained from the buffer have to be converted to Unicode, so what happens if they end part-way through a multibyte character? Ugh, I'll have to think about that one. Victor STINNER wrote: Some developers already think that adding sys.stdout.flush() after print(Processing.. , end='') is too hard (#11633). IIUC, that bug is about the behaviour of 'print', and didn't suggest to change the fact that sys.stdout is line-buffered. By the way, are these changes going to be in a major release? If I understand correctly, the layout of structs (for standard library types not prefixed with '_', such as 'buffered' in bufferedio.c or 'textio' in textio.c) can change with major releases but not with minor releases, correct? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print or input Unicode
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: I wrote: A similar issue arises for stdin: to maintain strict compatibility, every read from a TextIOWrapper attached to an input console would have to drain the buffer of its buffer object, in case the app has read from it. This is a bit tricky because the bytes drained from the buffer have to be converted to Unicode, so what happens if they end part-way through a multibyte character? Ugh, I'll have to think about that one. It seems like there is no correct way for an app to read from both sys.stdin, and sys.stdin.buffer (even without these console changes). It must choose one or the other. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print or input Unicode
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: I wrote: The only caveat would be that if you write a partial line to the buffer object (or if you set the buffer object to be fully buffered and write to it), and then write to the text stream, the buffer wouldn't be flushed before the text is written. Actually it looks like that already happens (because the sys.std{out,err} TextIOWrappers are line-buffered separately to their underlying buffers), so it would not be an incompatibility: $ python3 -c 'import sys; sys.stdout.write(foo); sys.stdout.buffer.write(bbar); sys.stdout.write(baz\n)' barfoobaz -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print or input Unicode
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: I wrote: $ python3 -c 'import sys; sys.stdout.write(foo); sys.stdout.buffer.write(bbar); sys.stdout.write(baz\n)' barfoobaz Hmm, the behaviour actually would differ here: the proposed implementation would print foobaz bar (the foobaz\n is written by a call to WriteConsoleW and then the bar gets flushed to stdout when the process exits). But since the naive expectation is foobarbaz\n and you already have to flush after each call in order to get that, I think this change in behaviour would be unlikely to affect correct applications. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print or input Unicode
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Glenn Linderman wrote: Presently, a correct application only needs to flush between a sequence of writes and a sequence of buffer.writes. Right. The new requirement would be that a correct app also needs to flush between a sequence of buffer.writes (that end in an incomplete line, or always if PYTHONUNBUFFERED or python -u is used), and a sequence of writes. Don't assume the flush happens after every write, for a correct application. It's rather hard to implement this without any change in behaviour. Or rather, it isn't hard if the TextIOWrapper were to flush its underlying buffer before each time it writes to the console, but I'd be concerned about the extra overhead of that call. I'd prefer not to do that unless the new requirement above leads to incompatibilities in practice. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print or input Unicode
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: (For anyone wondering about the hold-up on this bug, I ended up switching to Ubuntu. Not to worry, I now have Python 3 building in XP under VirtualBox -- which is further than I ever got with my broken Vista install :-/ It seems to behave identically to native XP as far as this bug is concerned.) Victor STINNER wrote: The question is now how to integrate WriteConsoleW() into Python without breaking the API, for example: - Should sys.stdout be a TextIOWrapper or not? It pretty much has to be a TextIOWrapper for compatibility. Also it's easier to implement it that way, because the text stream object has to be able to fall back to using the buffer if the fd is redirected. - Should sys.stdout.fileno() returns 1 or raise an error? Return sys.stdout.buffer.fileno(), which is 1 unless redirected. This is the Right Thing because in Windows, fds are an abstraction of the C runtime library, and the C runtime allows an fd to be associated with a console. In that case, from the application's point of view it is still writing to the same fd. In fact, we'd be implementing this by calling the WriteConsoleW win32 API directly in order to avoid bugs in the CRT's Unicode support, but that's an implementation detail. - What about sys.stdout.buffer: should sys.stdout.buffer.write() calls WriteConsoleA() or sys.stdout should not have a buffer attribute? I was thinking that sys.std{out,err}.buffer would still be set up exactly as they are now. Then if an app writes to that buffer, it will get interleaved with any writes via the text stream. (The writes to the buffer go to the underlying fd, which probably ends up calling WriteFile at the win32 level.) I think that many modules and programs now rely on sys.stdout.buffer to write directly bytes into stdout. There is at least python -m base64. That would just work. The only caveat would be that if you write a partial line to the buffer object (or if you set the buffer object to be fully buffered and write to it), and then write to the text stream, the buffer wouldn't be flushed before the text is written. I think that is fine as long as it is documented. If an app sets the .buffer attribute of sys.std{out,err}, it would fall back to using that buffer in the same way as when the fd is redirected. - Should we use ReadConsoleW() for stdin? Yes. I'll probably start with a patch that just handles std{out,err}, though. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print or input Unicode
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Feedback from Julie Solon of Microsoft: These console functions share a per-process heap that is 64K. There is some overhead, the heap can get fragmented, and calls from multiple threads all affect how much is available for this buffer. I am working to update the documentation for this function [WriteConsoleW] and other affected functions with information along these lines, and will post it within the next week or two. I replied thanking her and asking for clarification: When you say that the heap can get fragmented, is this true only when there are concurrent calls to the console functions, or can it occur even with single-threaded use? I'm trying to determine whether acquiring a process-global lock while calling these functions would be sufficient to ensure that the available heap space will not be unexpectedly low. (This assumes that the functions not used outside the lock by other libraries in the same process.) ReadConsoleW seems also to be affected, incidentally. I've asked for clarification about whether acquiring a process-global lock when using these functions ... Julie -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print utf8 (Py30a2)
Changes by David-Sarah Hopwood david-sa...@jacaranda.org: -- nosy: +BreamoreBoy versions: +Python 3.1, Python 3.2 -Python 3.3 Added file: http://bugs.python.org/file20360/doc-patch.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print utf8 (Py30a2)
Changes by David-Sarah Hopwood david-sa...@jacaranda.org: -- nosy: +BreamoreBoy versions: +Python 3.1, Python 3.2 -Python 3.3 Added file: http://bugs.python.org/file20361/doc-patch.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print utf8 (Py30a2)
Changes by David-Sarah Hopwood david-sa...@jacaranda.org: Added file: http://bugs.python.org/file20362/doc-patch.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print or input Unicode
Changes by David-Sarah Hopwood david-sa...@jacaranda.org: -- title: windows console doesn't print utf8 (Py30a2) - windows console doesn't print or input Unicode Added file: http://bugs.python.org/file20363/doc-patch.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print utf8 (Py30a2)
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: I'll have a look at the Py3k I/O internals and see what I can do. (Reopening a bug appears to need Coordinator permissions.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print utf8 (Py30a2)
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: The script unicode2.py uses the console STD_OUTPUT_HANDLE iff sys.stdout.fileno()==1. You may have missed if not_a_console(hStdout): real_stdout = False. not_a_console uses GetFileType and GetConsoleMode to check whether that handle is directed to something other than a console. But is it always the case? The technique used here for detecting a console is almost the same as the code for IsConsoleRedirected at http://blogs.msdn.com/b/michkap/archive/2010/05/07/10008232.aspx , or in WriteLineRight at http://blogs.msdn.com/b/michkap/archive/2010/04/07/9989346.aspx (I got it from that blog, can't remember exactly which page). [This code will give a false positive in the strange corner case that stdout/stderr is redirected to a console *input* handle. It might be better to use GetConsoleScreenBufferInfo instead of GetConsoleMode, as suggested by http://stackoverflow.com/questions/3648711/detect-nul-file-descriptor-isatty-is-bogus/3650507#3650507 .] What about pythonw.exe? I just tested that, using pythonw run from cmd.exe with stdout redirected to a file; it works as intended. It also works (for both console and non-console cases) when the handles are inherited from a parent process. Incidentally, what's the earliest supported Windows version for Py3k? I see that http://www.python.org/download/windows/ mentions Windows ME. I can fairly easily make it fall back to never using WriteConsoleW on Windows ME, if that's necessary. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print utf8 (Py30a2)
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Note: Michael Kaplan's code checks whether GetConsoleMode failed due to ERROR_INVALID_HANDLE. My code intentionally doesn't do that, because it is correct and conservative to fall back to the non-console behaviour when there is *any* error from GetConsoleMode. (It could also fail due to not having the GENERIC_READ right on the handle, for example.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5924] When setting complete PYTHONPATH on Python 3.x, paths in the PYTHONPATH are ignored
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Looking at http://svn.python.org/view/python/branches/py3k/PC/getpathp.c?r1=73322r2=73321pathrev=73322 , wouldn't it be better to add a Py_WGETENV function? There are likely to be other cases where that would be the correct thing to use. -- nosy: +davidsarah ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5924 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print utf8 (Py30a2)
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: ... os.dup2() ... Good point, thanks. It would work to change os.dup2 so that if its second argument is 0, 1, or 2, it calls _get_osfhandle to get the Windows handle for that fd, and then reruns the console-detection logic. That would even allow Unicode output to work after redirection to a different console. Programs that directly called the CRT dup2 or SetStdHandle would bypass this. Can we consider such programs to be broken? Methinks a documentation patch for os.dup2 would be sufficient, something like: When fd1 refers to the standard input, output, or error handles (0, 1 and 2 respectively), this function also ensures that state associated with Python's initial sys.{stdin,stdout,stderr} streams is correctly updated if needed. It should therefore be used in preference to calling the C library's dup2, or similar APIs such as SetStdHandle on Windows. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print utf8 (Py30a2)
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: haypo wrote: davidsarah wrote: It is certainly possible to write Unicode to the console successfully using WriteConsoleW Did you tried with characters not encodable to the code page and with character that cannot be rendeded by the font? Yes, characters not encodable to the code page do work (as confirmed by Glenn Linderman, since code page 437 does not include Cyrillic). Characters that cannot be rendered by the font print as missing-glyph boxes, as expected. They don't cause any other problem, and they can be cut-and-pasted to other Unicode-aware applications, showing up as the original characters. See msg120414 for my tests with WriteConsoleOutputW Even if it handled encoding correctly, WriteConsoleOutputW (http://msdn.microsoft.com/en-us/library/ms687404%28v=vs.85%29.aspx) would not be the right API to use in any case, because it prints to a rectangle of characters without scrolling. WriteConsoleW does scroll in the same way that printing to a console output stream normally would. (Redirection to a non-console stream can be detected and handled differently, as the code in unicode2.py does.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue410547] os.statvfs support for Windows
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Don't use win32file.GetDiskFreeSpace; the underlying Windows API only supports drives up to 2 GB (http://blogs.msdn.com/b/oldnewthing/archive/2007/11/01/5807020.aspx). Use GetFreeDiskSpaceEx, as the code I linked to does. I'm not sure it makes sense to provide an exact clone of os.statvfs, since some of the statvfs fields don't have equivalents that are obtainable by any Windows API as far as I know. What emwould/em make sense is a cross-platform way to get total disk space, and the space free for root/Administrator and for the current user. This would actually be somewhat easier to use on Unix as well. Anyway, here's some code for Windows that only uses ctypes (whichdir should be Unicode): from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_ulonglong from ctypes.wintypes import BOOL, DWORD, LPCWSTR # http://msdn.microsoft.com/en-us/library/aa383742%28v=VS.85%29.aspx PULARGE_INTEGER = POINTER(c_ulonglong) # http://msdn.microsoft.com/en-us/library/aa364937%28VS.85%29.aspx GetDiskFreeSpaceExW = WINFUNCTYPE(BOOL, LPCWSTR, PULARGE_INTEGER, PULARGE_INTEGER, PULARGE_INTEGER)( (GetDiskFreeSpaceExW, windll.kernel32)) # http://msdn.microsoft.com/en-us/library/ms679360%28v=VS.85%29.aspx GetLastError = WINFUNCTYPE(DWORD)((GetLastError, windll.kernel32)) # (This might put up an error dialog unless # SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX) # has been called.) n_free_for_user = c_ulonglong(0) n_total = c_ulonglong(0) n_free = c_ulonglong(0) retval = GetDiskFreeSpaceExW(whichdir, byref(n_free_for_user), byref(n_total), byref(n_free)) if retval == 0: raise OSError(Windows error %d attempting to get disk statistics for %r % (GetLastError(), whichdir)) free_for_user = n_free_for_user.value total = n_total.value free = n_free.value -- versions: +Python 2.7, Python 3.3 -Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue410547 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print utf8 (Py30a2)
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: It is certainly possible to write Unicode to the console successfully using WriteConsoleW. This works regardless of the console code page, including 65001. The code a href=http://tahoe-lafs.org/trac/tahoe-lafs/browser/src/allmydata/windows/fixups.py;here/a does so (it's for Python 2.x, but you'd be calling WriteConsoleW from C anyway). WriteConsoleW has one bug that I know of, which is that it a href=http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232;fails when writing more than 26608 characters at once/a. That's easy to work around by limiting the amount of data passed in a single call. Fonts are not Python's problem, but encoding is. It doesn't make sense to fail to output the right characters just because some users might not have selected fonts that can display those characters. This bug should be reopened. (For completeness, it is possible to display Unicode on the console using fonts other than Lucida Console and Consolas, but it a href=http://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/3259271#3259271;requires a registry hack/a.) -- nosy: +davidsarah ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602] windows console doesn't print utf8 (Py30a2)
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Glenn Linderman wrote: I skipped the unmangling of command-line arguments, because it produced an error I didn't understand, about needing a buffer protocol. If I understand correctly, that part isn't needed on Python 3 because issue2128 is already fixed there. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1602 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2128] sys.argv is wrong for unicode strings
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: The following code is being used to work around this issue for Python 2.x in Tahoe-LAFS: # This works around http://bugs.python.org/issue2128. GetCommandLineW = WINFUNCTYPE(LPWSTR)((GetCommandLineW, windll.kernel32)) CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int)) \ ((CommandLineToArgvW, windll.shell32)) argc = c_int(0) argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc)) argv = [argv_unicode[i].encode('utf-8') for i in range(0, argc.value)] if not hasattr(sys, 'frozen'): # If this is an executable produced by py2exe or bbfreeze, then it will # have been invoked directly. Otherwise, unicode_argv[0] is the Python # interpreter, so skip that. argv = argv[1:] # Also skip option arguments to the Python interpreter. while len(argv) 0: arg = argv[0] if not arg.startswith(-) or arg == -: break argv = argv[1:] if arg == '-m': # sys.argv[0] should really be the absolute path of the module source, # but never mind break if arg == '-c': argv[0] = '-c' break -- nosy: +davidsarah versions: +Python 2.5, Python 2.6, Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2128 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2128] sys.argv is wrong for unicode strings
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Sorry, missed out the imports: from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int from ctypes.wintypes import LPWSTR, LPCWSTR -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2128 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue410547] os.statvfs support for Windows
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Is there a portable way to get the available disk space by now? No, but http://tahoe-lafs.org/trac/tahoe-lafs/browser/src/allmydata/util/fileutil.py?rev=4894#L308 might be helpful (uses pywin32). -- nosy: +davidsarah ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue410547 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6058] Add cp65001 to encodings/aliases.py
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: This problem causes {{{os.getcwdu()}}} to fail when the console code page is set to 65001 (always, I think): {{{ t:\ver Microsoft Windows [Version 6.0.6002] t:\chcp Active code page: 65001 t:\python -c import os; print os.getcwdu() Traceback (most recent call last): File string, line 1, in module LookupError: unknown encoding: cp65001 t:\chcp 1252 Active code page: 1252 t:\python -c import os; print os.getcwdu() t:\ }}} Incidentally, I don't agree that this codepage needs to be distinguished from UTF-8. The deviations in the Microsoft codec are just their bugs. There is only one correct way to encode/decode UTF-8, and cp65001 is supposed to be UTF-8 according to Microsoft (e.g. http://msdn.microsoft.com/en-us/library/86hf4sb8%28en-US,VS.80%29.aspx ). -- nosy: +davidsarah ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6058 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6058] Add cp65001 to encodings/aliases.py
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: I said: There is only one correct way to encode/decode UTF-8. This is true modulo differences in the treatment of initial byte order marks. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6058 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6058] Add cp65001 to encodings/aliases.py
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: I meant to say that the os.getcwdu() test in msg119440 was done with Windows native Python 2.6.2. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6058 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6058] Add cp65001 to encodings/aliases.py
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Oops, false alarm. python -c import os; print repr(os.getcwdu()) works as expected, so the exception is part of issue 1602. (My command about there being no need to distinguish this codepage from UTF-8 stands.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6058 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7952] fileobject.c can switch between fread and fwrite without an intervening flush or seek, invoking undefined behaviour
New submission from David-Sarah Hopwood david-sa...@jacaranda.org: The C standard (any version, or POSIX), says in the description of fopen that: {{{ When a file is opened with update mode ( '+' as the second or third character in the mode argument), both input and output may be performed on the associated stream. However, the application shall ensure that output is not directly followed by input without an intervening call to fflush() or to a file positioning function ( fseek(), fsetpos(), or rewind()), and input is not directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file. }}} Objects/fileobject.c makes calls to fread and fwrite without taking this into account. So calls from Python to read or write methods of a file object opened in any rw mode, may invoke undefined behaviour. It isn't reasonable to rely on Python code to avoid this situation, even if were considered acceptable in C. (Arguably this is a bug in the C standard, but it is unlikely to be fixed there or in POSIX, because of differences in philosophy about language safety.) To fix this, fileobject.c should keep track of whether the last I/O operation was an input or output, and perform a call to fflush whenever an input follows an output or vice versa. This should not significantly affect performance in any case where the behaviour was previously defined (in cases where it wasn't, correctness trumps performance). fflush does not affect the file position and should have no other negative effect, because the stdio implementation is free to flush buffered data at any time (and certainly on I/O operations). Despite the undefined behaviour, I don't currently know of a platform where this would lead to an exploitable security bug. I'm marking this issue as security-relevant anyway, because it may prevent analysing whether Python applications behave securely only on the basis of documented behaviour. -- components: IO messages: 99483 nosy: davidsarah severity: normal status: open title: fileobject.c can switch between fread and fwrite without an intervening flush or seek, invoking undefined behaviour type: security versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7952 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7952] fileobject.c can switch between fread and fwrite without an intervening flush or seek, invoking undefined behaviour
David-Sarah Hopwood david-sa...@jacaranda.org added the comment: Correction: when input is followed by output, the call needed to avoid undefined behaviour has to be to a file positioning function (fseek, fsetpos, or rewind, but not fflush). Since fileobject.c does not use wide I/O operations, it should be sufficient to use _portable_fseek(fp, 0, SEEK_SET). (_portable_fseek may call some function that is not strictly defined to be a file positioning function, e.g. fseeko() or fseek64(). However, it would be insane for a stdio implementation not to treat those as being file positioning functions as far as the intent of the C or POSIX standards is concerned.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7952 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com