[issue46954] Awaiting multiple times on same task increases memory usage unboundedly
New submission from David M. : Awaiting multiple times on a single task that failed with an exception results in an unbounded increase in memory usage. Enough repeated "await"s of the task can result in an OOM. The same pattern on a task that didn't raise an exception behaves as expected. The attached short script ends up using more than 1GB of memory in less than a minute. -- components: asyncio files: multi_await_exception.py messages: 414739 nosy: asvetlov, davidmanzanares, yselivanov priority: normal severity: normal status: open title: Awaiting multiple times on same task increases memory usage unboundedly versions: Python 3.10, Python 3.11, Python 3.7, Python 3.8, Python 3.9 Added file: https://bugs.python.org/file50664/multi_await_exception.py ___ Python tracker <https://bugs.python.org/issue46954> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21830] ssl.wrap_socket fails on Windows 7 when specifying ca_certs
David M Noriega added the comment: Oops, thats what I get for running with scissors. Yes, the cert file is in pem format. Its the same file in use on my ldap server and all my servers and workstations that authenticate against it. I have an existing python 2.x script using the python-ldap(different from python3-ldap) module that uses this exact same file and works correctly. I've tested with the socket code above on python 2 and 3 and it works on my linux systems and on Windows XP. I only get this error on a Windows 7 system. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21830 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21830] ssl.wrap_socket fails on Windows 7 when specifying ca_certs
New submission from David M Noriega: When trying to use python3-ldap package on Windows 7, found I could not get a TLS connection to work and traced it to its use of ssl.wrap_socket. Trying out the following simple socket test fails import socket import ssl sock = socket.socket() sock.connect((host.name, 636)) ssl = ssl.wrap_socket(sock, cert_reqs=ssl.CERT_REQUIRED, ca_certs=rC:path\to\cert\file) Traceback (most recent call last): File pyshell#4, line 1, in module sock = ssl.wrap_socket(sock, cert_reqs=ssl.CERT_REQUIRED, ca_certs=rF:\Downloads\csbc-cacert.pem) File C:\Python34\lib\ssl.py, line 888, in wrap_socket ciphers=ciphers) File C:\Python34\lib\ssl.py, line 511, in __init__ self._context.load_verify_locations(ca_certs) ssl.SSLError: unknown error (_ssl.c:2734) This code works on Windows XP(and of course linux) and I'm able to use getpeercert() A workaround I was able to figure out was to use ssl.SSLContext in conjunction with Windows central certificate store. By first loading my CA cert into the trusted root cert store, I could use SSLContext.load_default_certs() to create an ssl socket. -- components: Windows messages: 221373 nosy: David.M.Noriega priority: normal severity: normal status: open title: ssl.wrap_socket fails on Windows 7 when specifying ca_certs versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21830 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: can't find win32api from embedded pyrun call
the problem was: ActivePython does not install debug libraries, so you must link with release libraries in your project. but if you run the debug version, you're linking against debug libraries which conflict with the ones linked to by python. fixed by running the release version. basically, it's not possible to debug with ActivePython due to ActiveState not including debug libs. grr -- https://mail.python.org/mailman/listinfo/python-list
Re: can't find win32api from embedded pyrun call
I find i'm having this problem, but the solution you found isn't quite specific enough for me to be able to follow it. I'm embedding Python27 in my app. I have users install ActivePython27 in order to take advantage of python in my app, so the python installation can't be touched as it's on a user's machine. When I attempt to do: import win32api i get this: Traceback (most recent call last): File startup.py, line 5, in module ImportError: DLL load failed: The specified module could not be found. I someone suggested i manually load the dependent libraries in the correct order, like this: import pywintypes import pythoncom import win32api but then i get this: Traceback (most recent call last): File startup.py, line 3, in module File C:\Python27\lib\site-packages\win32\lib\pywintypes.py, line 124, in module __import_pywin32_system_module__(pywintypes, globals()) File C:\Python27\lib\site-packages\win32\lib\pywintypes.py, line 64, in __import_pywin32_system_module__ import _win32sysloader ImportError: DLL load failed: The specified module could not be found. the ultimate goal here is actually to do this: from win32com.client.gencache import EnsureDispatch which currently yields: Traceback (most recent call last): File startup.py, line 3, in module File C:\Python27\lib\site-packages\win32com\__init__.py, line 5, in module import win32api, sys, os ImportError: DLL load failed: The specified module could not be found. So, if anyone has any idea, that would be super duper great. thanks so much! notes: my paths are definitely set correctly -- https://mail.python.org/mailman/listinfo/python-list
Re: can't find win32api from embedded pyrun call
note that when the script is called, i DO see this in the output window: 'kJams 2 Debug.exe': Loaded 'C:\Python27\Lib\site-packages\win32\win32api.pyd' 'kJams 2 Debug.exe': Loaded 'C:\Windows\SysWOW64\pywintypes27.dll' 'kJams 2 Debug.exe': Unloaded 'C:\Python27\Lib\site-packages\win32\win32api.pyd' 'kJams 2 Debug.exe': Unloaded 'C:\Windows\SysWOW64\pywintypes27.dll' -- https://mail.python.org/mailman/listinfo/python-list
Re: can't get utf8 / unicode strings from embedded python
I am very sorry that I have offended you to such a degree you feel it necessary to publicly eviscerate me. Perhaps I could have worded it like this: So far I have not seen any troubles including unicode characters in my strings, they *seem* to be fine for my use-case. What kind of trouble has been seen with this by others? Really, I wonder why you are so angry at me for having made a mistake? I'm going to guess that you don't have kids. -- http://mail.python.org/mailman/listinfo/python-list
Re: can't get utf8 / unicode strings from embedded python
Thank you for your thoughtful and thorough response. I now understand much better what you (and apparently the others) were warning me against and I will certainly consider that moving forward. I very much appreciate your help as I learn about python and embedding and all these crazy encoding problems. What do kids have to do with this? When a person has children, they quickly learn that the best way to deal with some one who seems to be not listening or having a tantrum: show understanding and compassion, restraint and patience, as you, in the most neutral way that you can, gently bit firmly guide said person back on track. You learn that if you instead express your frustration at said person, that it never, ever helps the situation, and only causes more hurt to be spread around to the very people you are ostensibly attempting to help. Are you an adult or a child? Perhaps my comment was lost in translation, but this is rather the question that I was obliquely asking you. *wink right back* In any case I thank you for your help, which has in fact been quite great! My demo script is working, and I know now to properly advise my script writers regarding how to properly encode strings. -- http://mail.python.org/mailman/listinfo/python-list
Re: can't get utf8 / unicode strings from embedded python
i am already doing (3), and all is working perfectly. bytestring literals are fine, i'm not sure what this trouble is that you speak of. note that i'm not using PyRun_AnyFile(), i'm loading the script myself, assumed as utf8 (which was my original problem, i had assumed it was macRoman), then calling PyRun_SimpleString(). it works flawlessly now, on both mac and windows. -- http://mail.python.org/mailman/listinfo/python-list
Re: can't get utf8 / unicode strings from embedded python
i'm sorry this is so confusing, let me try to re-state the problem in as clear a way as i can. I have a C++ program, with very well tested unicode support. All logging is done in utf8. I have conversion routines that work flawlessly, so i can assure you there is nothing wrong with logging and unicode support in the underlying program. I am embedding python 2.7 into the program, and extending python with routines in my C++ program. I have a script, encoded in utf8, and *marked* as utf8 with this line: # -*- coding: utf-8 -*- In that script, i have inline unicode text. When I pass that text to my C++ program, the Python interpreter decides that these bytes are macRoman, and handily converts them to unicode. To compensate, i must convert these macRoman characters encoded as utf8, back to macRoman, then interpret them as utf8. In this way i can recover the original unicode. When i return a unicode string back to python, i must do the reverse so that Python gets back what it expects. This is not related to printing, or sys.stdout, it does happen with that too but focusing on that is a red-herring. Let's focus on just passing a string into C++ then back out. This would all actually make sense IF my script was marked as being macRoman even tho i entered UTF8 Characters, but that is not the case. Let's prove my statements. Here is the script, *interpreted* as MacRoman: http://karaoke.kjams.com/screenshots/bugs/python_unicode/script_as_macroman.png and here it is again *interpreted* as utf8: http://karaoke.kjams.com/screenshots/bugs/python_unicode/script_as_utf8.png here is the string conversion code: SuperString ScPyObject::GetAs_String() { SuperString str;// underlying format of SuperString is unicode if (PyUnicode_Check(i_objP)) { ScPyObject utf8Str(PyUnicode_AsUTF8String(i_objP)); str = utf8Str.GetAs_String(); } else { const UTF8Char *bytes_to_interpetZ = uc(PyString_AsString(i_objP)); // the Set call *interprets*, does not *convert* str.Set(bytes_to_interpetZ, kCFStringEncodingUTF8); // str is now unicode characters which *represent* macRoman characters // so *convert* these to actual macRoman // fyi: Update_utf8 means convert to this encoding and // store the resulting bytes in the variable named utf8 str.Update_utf8(kCFStringEncodingMacRoman); // str is now unicode characters converted from macRoman // so *reinterpret* them as UTF8 // FYI, we're just taking the pure bytes that are stored in the utf8 variable // and *interpreting* them to this encoding bytes_to_interpetZ = str.utf8().c_str(); str.Set(bytes_to_interpetZ, kCFStringEncodingUTF8); } return str; } PyObject* PyString_FromString(const SuperString str) { SuperString localStr(str); // localStr is the real, actual unicode string // but we must *interpret* it as macRoman, then take these macRoman characters // and convert them to unicode for Python to get it const UTF8Char *bytes_to_interpetZ = localStr.utf8().c_str(); // take the utf8 bytes (actual utf8 prepresentation of string) // and say no, these bytes are macRoman localStr.Set(bytes_to_interpetZ, kCFStringEncodingMacRoman); // okay so now we have unicode of MacRoman characters (!?) // return the underlying utf8 bytes of THAT as our string return PyString_FromString(localStr.utf8Z()); } And here is the results from running the script: 18: --- 18: Original string: frøânçïé 18: converting... 18: it worked: frøânçïé 18: --- 18: --- 18: Original string: 控件 18: converting... 18: it worked: 控件 18: --- Now the thing that absolutely utterly baffles me (if i'm not baffled enough) is that i get the EXACT same results on both Mac and Windows. Why do they both insist on interpreting my script's bytes as MacRoman? -- http://mail.python.org/mailman/listinfo/python-list
Re: can't get utf8 / unicode strings from embedded python
fair enough. I can provide further proof of strangeness. here is my latest script: this is saved on disk as a UTF8 encoded file, and when viewing as UTF8, it shows the correct characters. == # -*- coding: utf-8 -*- import time, kjams, kjams_lib def log_success(msg, successB, str): if successB: print msg + worked: + str else: print msg + failed: + str def do_test(orig_str): cmd_enum = kjams.enum_cmds() print --- print Original string: + orig_str print converting... oldstr = orig_str; newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, oldstr) log_success(first, oldstr == newstr, newstr); oldstr = unicode(orig_str, UTF-8) newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, oldstr) newstr = unicode(newstr, UTF-8) log_success(second, oldstr == newstr, newstr); oldstr = unicode(orig_str, UTF-8) oldstr.encode(UTF-8) newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, oldstr) newstr = unicode(newstr, UTF-8) log_success(third, oldstr == newstr, newstr); print --- def main(): do_test(frøânçïé) do_test(控件) #- if __name__ == __main__: main() == and the latest results: 20: --- 20: Original string: frøânçïé 20: converting... 20: first worked: frøânçïé 20: second worked: frøânçïé 20: third worked: frøânçïé 20: --- 20: --- 20: Original string: 控件 20: converting... 20: first worked: 控件 20: second worked: 控件 20: third worked: 控件 20: --- now, given the C++ source code, this should NOT work, given that i'm doing some crazy re-coding of the bytes. so, you see, it does not matter whether i pass unicode strings or regular strings, they all translate to the same, weird macroman. for completeness, here is the C++ code that the script calls: === case kScriptCommand_Unicode_Test: { pyArg = iterP.NextArg_OrSyntaxError(); if (pyArg.get()) { SuperString str = pyArg.GetAs_String(); resultObjP = PyString_FromString(str); } break; } === -- http://mail.python.org/mailman/listinfo/python-list
Re: can't get utf8 / unicode strings from embedded python
i got it!! OMG! so sorry for the confusion, but i learned a lot, and i can share the result: the CORRECT code *was* what i had assumed. the Python side has always been correct (no need to put u in front of strings, it is known that the bytes are utf8 bytes) it was my run script function which read in the file. THAT was what was reinterpreting the utf8 bytes as macRoman (on both platforms). correct code below: SuperString ScPyObject::GetAs_String() { SuperString str; if (PyUnicode_Check(i_objP)) { ScPyObject utf8Str(PyUnicode_AsUTF8String(i_objP)); str = utf8Str.GetAs_String(); } else { // calling uc on this means assume this is utf8 str.Set(uc(PyString_AsString(i_objP))); } return str; } PyObject* PyString_FromString(const SuperString str) { return PyString_FromString(str.utf8Z()); } -- http://mail.python.org/mailman/listinfo/python-list
Re: can't get utf8 / unicode strings from embedded python
I see you are using Python 2 correct Firstly, in Python 2, the compiler assumes that the source code is encoded in ASCII gar, i must have been looking at doc for v3, as i thought it was all assumed to be utf8 # -*- coding: utf-8 -*- okay, did that, still no change you need to use u ... delimiters for Unicode, otherwise the results you get are completely arbitrary and depend on the encoding of your terminal. okay, well, i'm on a mac, and not using terminal at all. but if i were, it would be utf8 but it's still not flying :( For example, if I set my terminal encoding to IBM-850 okay how do you even do that? this is not an interactive session, this is embedded python, within a C++ app, so there's no terminal. but that is a good question: all the docs say default encoding everywhere (as in If string is a Unicode object, this function computes the default encoding of string and operates on that), but fail to specify just HOW i can set the default encoding. if i could just say hey, default encoding is utf8, i think i'd be done? So change the line of code to: print ufrøânçïé okay, sure... but i get the exact same results Those two changes ought to fix the problem, but if they don't, try setting your terminal encoding to UTF-8 as well well, i'm not sure what you mean by that. i don't have a terminal here. i'm logging to a utf8 log file (when i print) but what it *actually* prints is this: print frøânçïé -- fr√∏√¢n√ß√Ø√© It's hard to say what *exactly* is happening here, because you don't explain how the python print statement somehow gets into your C++ Log code. Do I guess right that it catches stdout? yes, i'm redirecting stdout to my own custom print class, and then from that function i call into my embedded C++ print function If so, then what I expect is happening is that Python has read in the source code of print ~ with ~ as a bunch of junk bytes, and then your terminal is displaying those junk bytes according to whatever encoding it happens to be using. Since you are seeing this: fr√∏√¢n√ß√Ø√© my guess is that you're using a Mac, and the encoding is set to the MacRoman encoding. Am I close? you hit the nail on the head there, i think. using that as a hint, i took this text fr√∏√¢n√ß√Ø√© and pasted that into a macRoman document, then *reinterpreted* it as UTF8, and voala: frøânçïé so, it seems that i AM getting my utf8 bytes, but i'm getting them converted to macRoman. huh? where is macRoman specified, and how to i change that to utf8? i think that's the missing golden ticket -- http://mail.python.org/mailman/listinfo/python-list
Re: can't get utf8 / unicode strings from embedded python
What _are_ you using? i have scripts in a file, that i am invoking into my embedded python within a C++ program. there is no terminal involved. the print statement has been redirected (via sys.stdout) to my custom print class, which does not specify encoding, so i tried the suggestion above to set it: static const char *s_RedirectScript = import kEmbeddedModuleName \n import sys\n \n class CustomPrintClass:\n def write(self, stuff):\n kEmbeddedModuleName . kCustomPrint (stuff)\n class CustomErrClass:\n def write(self, stuff):\n kEmbeddedModuleName . kCustomErr (stuff)\n sys.stdout = CustomPrintClass()\n sys.stderr = CustomErrClass()\n sys.stdout.encoding = 'UTF-8'\n sys.stderr.encoding = 'UTF-8'\n; but it didn't help. I'm still getting back a string that is a utf-8 string of characters that, if converted to macRoman and then interpreted as UTF8, shows the original, correct string. who is specifying macRoman, and where, and how do i tell whoever that is that i really *really* want utf8? -- http://mail.python.org/mailman/listinfo/python-list
can't get utf8 / unicode strings from embedded python
note everything works great if i use Ascii, but: in my utf8-encoded script i have this: print frøânçïé in my embedded C++ i have this: PyObject* CPython_Script::print(PyObject *args) { PyObject*resultObjP = NULL; const char *utf8_strZ = NULL; if (PyArg_ParseTuple(args, s, utf8_strZ)) { Log(utf8_strZ, false); resultObjP = Py_None; Py_INCREF(resultObjP); } return resultObjP; } Now, i know that my Log() can print utf8 (has for years, very well debugged) but what it *actually* prints is this: print frøânçïé -- fr√∏√¢n√ß√Ø√© another method i use looks like this: kj_commands.menu(控件, 同步滑帧, 全局无滑帧) or kj_commands.menu(u控件, u同步滑帧, u全局无滑帧) and in my C++ i have: SuperString ScPyObject::GetAs_String() { SuperString str; if (PyUnicode_Check(i_objP)) { #if 1 // method 1 { ScPyObject utf8Str(PyUnicode_AsUTF8String(i_objP)); str = utf8Str.GetAs_String(); } #elif 0 // method 2 { UTF8Char*uniZ = (UTF8Char *)PyUnicode_AS_UNICODE(i_objP); str.assign(uniZ[0], uniZ[PyUnicode_GET_DATA_SIZE(i_objP)], kCFStringEncodingUTF16); } #else // method 3 { UTF32VeccharVec(32768); CF_ASSERT(sizeof(UTF32Vec::value_type) == sizeof(wchar_t)); PyUnicodeObject *uniObjP = (PyUnicodeObject *)(i_objP); Py_ssize_t sizeL(PyUnicode_AsWideChar(uniObjP, (wchar_t *)charVec[0], charVec.size())); charVec.resize(sizeL); charVec.push_back(0); str.Set(SuperString(charVec[0])); } #endif } else { str.Set(uc(PyString_AsString(i_objP))); } Log(str.utf8Z()); return str; } for the string, 控件, i get: -- Êé߉ª∂ for the *unicode* string, u控件, Methods 1, 2, and 3, i get the same thing: -- Êé߉ª∂ okay so what am i doing wrong??? -- http://mail.python.org/mailman/listinfo/python-list
Re: Raw_input with readline in a daemon thread makes terminal text disappear
Hi all, This is an old thread, but I'm having the same behavior in my terminal when I run some code but kill the process in the terminal (Ctrl-C). The code has two prime suspects (from a simple google search): 1. Creates ssh port forward via the subprocess module (http://unix.stackexchange.com/questions/4740/screen-remote-login-failure-an d-disappearing-text) 2. Using the getpass module (raw_input?) Calling $ reset brings back the disappearing text, so I'm just wondering if this issue has been addressed and if so, what should I be doing that I'm not. Thank you, Dave W. Response to post: http://mail.python.org/pipermail/python-list/2009-October/554784.html I'm getting input for a program while it's running by using raw_input in a loop in separate thread. This works except for the inconvenience of not having a command history or the use of backspace etc. That can be solved by loading the readline module; however, it results in a loss of visible access to the terminal when the program ends: nothing is echoed to the screen and the history is invisible (although it is there - hitting return executes whatever should be there normally). The only way to get it back is to close the terminal and open a new one. Here is minimal code that reproduces the problem (python 2.5 on Linux): from threading import Thread import readline get_input = Thread(target=raw_input) get_input.setDaemon(True) get_input.start() If the thread is not set to daemon mode, there is no such problem (don't know why this makes a difference), but in the real program, it needs to be a daemon or it hangs the exit waiting for more input. Any suggestions appreciated. Thanks, John -- http://mail.python.org/mailman/listinfo/python-list
Re: PyArg_ParseTuple() when the type could be anything?
i was able to get what i wanted by simply iterating over the tupile instead of using ParseTupile, then just query the type, then convert the type to C and move on to the next. totally great, now i can pass N different argument types to a single function, and have the C side deal gracefully with whatever types are sent. -- http://mail.python.org/mailman/listinfo/python-list
PyArg_ParseTuple() when the type could be anything?
I'd like to be able to use PyArg_ParseTuple() in a generic way. for example, i'd like to have all commands start with 1 integer parameter, and this commandID will inform me of what parameters come next (via LUT). knowing that i can then call ParseTuple again with the proper parameters. like this: if (PyArg_ParseTuple(args, i|, commandID)) { switch (commandID) { case cmd_with_str: { const char *strZ = NULL; if (PyArg_ParseTuple(args, is, commandID, strZ)) { // do something with string } break; } case cmd_with_float: { float valF = -1; if (PyArg_ParseTuple(args, if, commandID, valF)) { // do something with float } break; } } } is there a way to achieve this? the i| at the start is not working -- http://mail.python.org/mailman/listinfo/python-list
Re: how to package embedded python?
okay, well that might turn out to be useful, except i don't quite know how to use it, and there are no from scratch instructions. i managed to download py2exe-0.6.9.zip and unzip it, but how does one install this package? (yes, still a newb at that) then, once installed, how do i say include the entire world instead of just mymodule ? cuz the point of embedding python on my app is that the end-user can run any script at all, not just one module. -- http://mail.python.org/mailman/listinfo/python-list
Re: embedding: how to create an idle handler to allow user to kill scripts?
Okay, i'm really surprised nobody knows how to do this. and frankly i'm amazed at the utter lack of documentation. but i've figured it out, and it's all working beautifully. if you want the code, go here: http://karaoke.kjams.com/wiki/Python -- http://mail.python.org/mailman/listinfo/python-list
Re: how to package embedded python?
yes, i've looked there, and all over google. i'm quite expert at embedding at this point. however nowhere i have looked has had instructions for this this is how you package up your .exe with all the necessary python modules necessary to actually run on a user's system that does not have python installed. on mac, it's trivial: all macs come with python, there is nothing i need to include with my app and it just works on windows: if you don't include the proper DLLs and/or whatnot, then the app will complain about missing DLLs on startup. What DLLs must i include? where are the instructions? -- http://mail.python.org/mailman/listinfo/python-list
Re: how to package embedded python?
nooobody knw the trouble a s... -- http://mail.python.org/mailman/listinfo/python-list
embedding: how to create an idle handler to allow user to kill scripts?
in my C++ app, on the main thread i init python, init threads, then call PyEval_SaveThread(), since i'm not going to do any more python on the main thread. then when the user invokes a script, i launch a preemptive thread (boost::threads), and from there, i have this: static int CB_S_Idle(void *in_thiz) { CT_RunScript*thiz((CT_RunScript *)in_thiz); return thiz-Idle(); } int Idle() { int resultI = 0; OSStatuserr = noErr; ERR(i_taskRecP-MT_UpdateData(i_progData)); if (err) { resultI = -1; } ERR(ScheduleIdleCall()); return err; } int ScheduleIdleCall() { int resultI(Py_AddPendingCall(CB_S_Idle, this)); CFAbsoluteTime timeT(CFAbsoluteTimeGetCurrent()); SuperString str; str.Set(timeT, SS_Time_LOG); Logf($$$ Python idle: (%d) %s\n, resultI, str.utf8Z()); return resultI; } virtual OSStatusoperator()(OSStatus err) { ScPyGILStatesc; ERR(ScheduleIdleCall()); ERR(PyRun_SimpleString(i_script.utf8Z())); return err; } so, my operator() gets called, and i try to schedule an Idle call, which succeeds, then i run my script. however, the CB_S_Idle() never gets called? the MT_UpdateData() function returns an error if the user had canceled the script must i schedule a run-loop on the main thread or something to get it to be called? -- http://mail.python.org/mailman/listinfo/python-list
embedded python and threading
in my app i initialize python on the main thread, then immediately call PyEval_SaveThread() because i do no further python stuff on the main thread. then, for each script i want to run, i use boost::threads to create a new thread, then on that thread i ensure the GIL, do my stuff, then release it. so, to test concurrency, on my first background thread, i do an infinite loop that just logs i'm alive, then calls sleep(0.25) so that thread continues to run forever (with it's GIL ensured) according to the doc: In order to emulate concurrency of execution, the interpreter regularly tries to switch threads so i figure i can run another thread that does a single print statement: ensure gil print my thing release gil and this DOES run. however, after releasing it's gil, i guess the interpeter gets back to the first back thread, but then has this error immediately: 9: Traceback (most recent call last): 9: File string, line 70, in ? 9: File string, line 55, in main 9: AttributeError: 'builtin_function_or_method' object has no attribute 'sleep' suddenly the sleep module has been unloaded?? huh? i thought the thread state had been preserved? -- http://mail.python.org/mailman/listinfo/python-list
Re: how to package embedded python?
does nobody know how to do this? does nobody know where proper documentation on this is? -- http://mail.python.org/mailman/listinfo/python-list
Re: embedded python and threading
okay, i have simplified it: here is the code == import time def main(): while True: print i'm alive time.sleep(0.25) #- if __name__ == __main__: main() == the new error is: == 9: Traceback (most recent call last): 9: File string, line 10, in ? 9: File string, line 6, in main 9: AttributeError: 'builtin_function_or_method' object has no attribute 'sleep' == -- http://mail.python.org/mailman/listinfo/python-list
Re: embedded python and threading
no, there is no time.py anywhere (except perhaps as the actual python library originally imported) did you understand that the function works perfectly, looping as it should, up until the time i run a second script on a separate thread? -- http://mail.python.org/mailman/listinfo/python-list
Re: embedded python and threading
DOH! as my second thread, i had been using a sample script that i had copy-pasted without much looking at it. guess what? it prints the time. and yes, it did from time import time, which explains it all. thanks for the hints here, that helped me figure it out! -- http://mail.python.org/mailman/listinfo/python-list
Re: how: embed + extend to control my running app?
Okay the link problem was solved: i had installed a 64bit python and my app is 32bit. i'm using ActivePython installer from here: http://www.activestate.com/activepython/downloads it seems that now the problem is that this does not install the _d versions of the .lib. :( does anyone know how to get or create the _d version of the .lib out of the ActivePtyon installation? -- http://mail.python.org/mailman/listinfo/python-list
how to package embedded python?
what must i include in my app package if i'm embedding python? i tried including *everything* in the DLLs directory, but my app still crashes as soon as i attempt to initialize python. this is on a system that does not have python installed, as most of my users won't have it. is it actually a requirement that they first install python? (cuz it does work then) -- http://mail.python.org/mailman/listinfo/python-list
Re: how: embed + extend to control my running app?
well, umm, gosh, now i feel quite silly. that was easy. okay that's done. next: i'd like to redirect the output of any print statements to my C function: voidLog(const unsigned char *utf8_cstrP); on the mac, python output sys.stdout goes into the debug console if you're in the debugger, and to the console app if not. On windows, i don't think it goes anywhere at all? So: i really want it to go to my own log file (via my Log() function). now, can i specify please output to this FILE* ?, i looked at all the python c headers but found nothing about redirecting the output. I see PySys_GetFile() which will get what it points to, but what i want is a PySys_SetFile() so i can set it. the only alternative seems to be: PyObject*logObjectP = create ???; ERR(PySys_SetObject(stdout, logObjectP)); if that's the only way, how to create the logObjectP such that it redirects the write() python function to my Log() C function? i tried this: const char *s_printFunc = import sys\n class CustomPrint():\n def __init__(self):\n self.old_stdout=sys.stdout\n \n def write(self, text):\n self.old_stdout.write('foobar')\n text = text.rstrip()\n if len(text) == 0:\n return\n self.old_stdout.write('custom Print---' + text + '\n')\n; OSStatusCPython_PreAlloc(const char *utf8Z) { OSStatuserr = noErr; PyCompilerFlags flags; PyObject*logObjectP = NULL; Py_SetProgramName(const_castchar *(utf8Z)); Py_Initialize(); flags.cf_flags = PyCF_SOURCE_IS_UTF8; logObjectP = Py_CompileStringFlags(s_printFunc, CustomPrint, Py_single_input, flags); ERR_NULL(logObjectP, tsmUnsupScriptLanguageErr); if (!err) { ERR(PySys_SetObject(stdout, logObjectP)); ERR(PySys_SetObject(stderr, logObjectP)); Py_DECREF(logObjectP); } return err; } voidCPython_PostDispose() { Py_Finalize(); } voidCPython_Test() { PyRun_SimpleString( from time import time, ctime\n print 'Today is', ctime(time())\n); } - and when i run CPython_Test(), there is no output at all. If i comment out the entire Py_CompileStringFlags() line, then the output works fine (going to stdout as expected), so i'm not sure what i'm doing wrong -- http://mail.python.org/mailman/listinfo/python-list
Re: how: embed + extend to control my running app?
i don't use stdout in my own code, my code goes to my own log file. i want the output from any python code to go to my existing log file, so log statements from my app and any python code are intermingled in that one file. my updated code is here, which now bridges my python print function to my C function: http://karaoke.kjams.com/wiki/Python but it seems that my custom s_printFunc is never called ? -- http://mail.python.org/mailman/listinfo/python-list
Re: how: embed + extend to control my running app?
http://karaoke.kjams.com/wiki/Python nevermind, i got it, it's working now (see link for code) -- http://mail.python.org/mailman/listinfo/python-list
Re: how: embed + extend to control my running app?
Now for Windows: same thing, i think i must create a .dll, right? you should already have a python.dll in your installation i can find python27.lib in the libs folder, but there is no python27_d.lib, and there is no python27.dll in the DLLs folder? are there instructions for creating (or finding) these for Windows? -- http://mail.python.org/mailman/listinfo/python-list
Re: how: embed + extend to control my running app?
update: okay so the python27.dll is in /windows/system32 so ignore that i've set my include directory correct, so i can compile i've set my additional libraries directory to the libs directory (where the .lib files are. (note: NOT including Lib directory, cuz that's full of .py files and folders) (note: NOT including DLLs directory, cuz, why would i?) No need to specify additional dependencies for the .lib file, cuz the pyconfig.h file does that. but there is no python27_d.dll anywhere to be found, so i hacked pyconfig.h to get rid of the _d. so it all compiles. but it won't link: LNK2001: unresolved external symbol __imp___Py_RefTotal LNK2001: unresolved external symbol __imp___Py_NoneStruct LNK2019: unresolved external symbol __imp__PyArg_ParseTuple LNK2019: unresolved external symbol __imp__PyFloat_FromDouble LNK2019: unresolved external symbol __imp__PyString_FromString LNK2019: unresolved external symbol __imp__PyRun_SimpleStringFlags LNK2019: unresolved external symbol __imp__Py_InitModule4TraceRefs LNK2019: unresolved external symbol __imp__Py_Initialize LNK2019: unresolved external symbol __imp__Py_SetProgramName LNK2019: unresolved external symbol __imp__Py_Finalize LNK2019: unresolved external symbol __imp__PyRun_SimpleFileExFlags what, pray tell, am i doing wrong? *hopeful face* -- http://mail.python.org/mailman/listinfo/python-list
Re: how: embed + extend to control my running app?
i'm targeting Mac and Windows. Let's skip the thing about it should work when my app isn't running, just assume it's going to be embedded, no pipes or sockets necessary. For Mac, I understand i need to create (?) a python.dylib, but i find no directions for that at the expected location: http://docs.python.org/2/extending/embedding.html is there some wiki page explaining how to create this for use in MacOS / Xcode? Now for Windows: same thing, i think i must create a .dll, right? Is there a tutorial for that? After that, i can link to these items, then in my C++ app, just #include Python.h and i've covered step 1. -- http://mail.python.org/mailman/listinfo/python-list
how: embed + extend to control my running app?
i'd like my app to be available to python while it's running. for example, say my app is FooBar.app. when my FooBar.app is running, now there is a python interface available to python, and the user can write python scripts to make use of it. with their scripts, they can control my running application when FooBar.app is NOT running, perhaps making use of any of the python functions of FooBar.app would either return an error, or possibly launch FooBar.app? or do nothing since it's not running? can boost::python help with this? i've never worked with extending or embedding python, so any help would be super great -- http://mail.python.org/mailman/listinfo/python-list
Re: Understanding other people's code
Literally any idea will help, pen and paper, printing off all the code and doing some sort of highlighting session - anything! I keep reading bits of code and thinking well where the hell has that been defined and what does it mean to find it was inherited from 3 modules up the chain. I really need to get a handle on how exactly all this slots together! Any techniques,tricks or methodologies that people find useful would be much appreciated. I'd highly recommend Eclipse with PyDev, unless you have some strong reason not to. That's what I use, and it saves pretty much all of those what's this thing? problems, as well as lots of others... DC -- http://mail.python.org/mailman/listinfo/python-list
Re: Functional vs. Object oriented API
Roy Smith r...@panix.com As part of our initial interview screen, we give applicants some small coding problems to do. One of the things we see a lot is what you could call Java code smell. This is our clue that the person is really a Java hacker at heart who just dabbles in Python but isn't really fluent. ... It's not just LongVerboseFunctionNamesInCamelCase(). Nor is it code that looks like somebody bought the Gang of Four patterns book and is trying to get their money's worth out of the investment. The real dead giveaway is when they write classes which contain a single static method and nothing else. I may have some lingering Java smell myself, although I've been working mostly in Python lately, but my reaction here is that's really I don't know BASIC smell or something; a class that contains a single static method and nothing else isn't wonderful Java design style either. That being said, I've noticed in my own coding, it's far more often that I start out writing some functions and later regret not having initially made it a class, than the other way around. That's as true in my C++ code as it is in my Python. Definitely. Once you start having state (i.e. data) and behavior (i.e. functions) in the same thought, then you need a class. If you find yourself passing the same bunch of variables around to multiple functions, that's a hint that maybe there's a class struggling to be written. And I think equally to the point, even if you have only data, or only functions, right now, if the thing in question has that thing-like feel to it :) you will probably find yourself with both before you're done, so you might as well make it a class now... DC -- http://mail.python.org/mailman/listinfo/python-list
Re: Cannot run a single MySQLdb execute....
Νίκος Γκρ33κ nikos.gr...@gmail.com : What paramstyle are you using? Yes it is Chris, but i'am not sure what exactly are you asking me. Please if you cna pout it even simper for me, thank you. For instance: import MySQLdb MySQLdb.paramstyle 'format' FWIW and HTH, DC -- http://mail.python.org/mailman/listinfo/python-list
[issue16856] Segfault from calling repr() on a dict with a key whose repr raise an exception
New submission from David M. Cooke: The following segfaults: class A(int): def __repr__(self): raise Exception() a = A() d = {a : 1} repr(d) This is with Python 3.3.0, running on Mac OS 10.7.5, from MacPorts: Python 3.3.0 (default, Sep 29 2012, 08:16:08) [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.58)] on darwin -- components: Interpreter Core messages: 178997 nosy: david.m.cooke priority: normal severity: normal status: open title: Segfault from calling repr() on a dict with a key whose repr raise an exception type: crash versions: Python 3.3 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue16856 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Getting a TimedRotatingFileHandler not to put two dates in the same file?
d...@davea.name On 10/23/2012 11:23 AM, David M Chess wrote: We have a TimedRotatingFileHandler with when='midnight' You give us no clue what's in this class, or how it comes up with the filenames used. Sorry if I was unclear. This isn't my own subclass of TimedRotatingFileHandler or anything, this is the bog-standard logging.handlers.TimedRotatingFileHandler I'm talking about. So all clues about what's in the class, and how it comes up with the filenames used, is available at http://docs.python.org/library/logging.handlers.html#timedrotatingfilehandler :) The specific Python version involved here is Python 2.6.6 (r266:84297, Aug 24 2010, 18:46:32), to the extent that that matters... This works great, splitting the log information across files by date, as long as the process is actually up at midnight. But now the users have noticed that if the process isn't up at midnight, they can end up with lines from two (or I guess potentially more) dates in the same log file. Is there some way to fix this, either with cleverer arguments into the TimedRotatingFileHandler, or by some plausible subclassing of it or its superclass? Tx, DC http://mail.python.org/mailman/listinfo/python-list
Re: Getting a TimedRotatingFileHandler not to put two dates in the same file?
w...@mac.com Something like: Does a log file exist? - No - First run; create log file continue | Yes | Read backwards looking for date change, copy lines after change to new file, delete from old file. Yep, I'm concluding that also. It just wasn't clear to me from the documentation whether or not the existing TimedRotatingFileHandler had any at startup, see if we missed any rollovers, and do them now if so function, or if there was some known variant that does. The answer, apparently, being nope. :) Shouldn't be that hard to write, so that's probably what we'll do. DC --- -- http://mail.python.org/mailman/listinfo/python-list
A lock that prioritizes acquire()s?
Okay, next silly question. :) We have a very simple multi-threaded system where a request comes in, starts running in a thread, and then (zero, one, or two times per request) gets to a serialization point, where the code does: with lock: do_critical_section_stuff_that_might_take_awhile() and then continues. Which is almost the same as: lock.acquire() try: do_critical_section_stuff_that_might_take_awhile() finally: lock.release() Now we discover that It Would Be Nice if some requests got priority over others, as in something like: lock.acquire(importance=request.importance) try: do_critical_section_stuff_that_might_take_awhile() finally: lock.release() and when lock.release() occurs, the next thread that gets to run is one of the most important ones currently waiting in acquire() (that's the exciting new thing). Other requirements are that the code to do this be as simple as possible, and that it not mess anything else up. :) My first thought was something like a new lock-ish class that would do roughly: class PriorityLock(object): def __init__(self): self._lock = threading.Lock() self._waiter_map = {} # maps TIDs to importance def acquire(self,importance=0): this_thread = threading.currentThread() self._waiter_map[this_thread] = importance # I want in while True: self._lock.acquire() if ( max( self._waiter_map.values())=importance ): # we win del self._waiter_map[this_thread] # not waiting anymore return # return with lock acquired self._lock.release() # We are not most impt: release/retry def release(self): self._lock.release() (Hope the mail doesn't garble that too badly.) Basically the acquire() method just immediately releases and tries again if it finds that someone more important is waiting. I think this is semantically correct, as long as the underlying lock implementation doesn't have starvation issues, and it's nice and simple, but on the other hand it looks eyerollingly inefficient. Seeking any thoughts on other/better ways to do this, or whether the inefficiency will be too eyerolling if we get say one request per second with an average service time a bit under a second but maximum service time well over a second, and most of them are importance zero, but every (many) seconds there will be one or two with higher importance. Tx, DC --- -- http://mail.python.org/mailman/listinfo/python-list
Re: A lock that prioritizes acquire()s?
Lovely, thanks for the ideas! I remember considering having release() pick the next thread to notify, where all the waiters were sitting on separate Conditions or whatever; not sure why I didn't pursue it to the end. Probably distracted by something shiny; or insufficient brainpower. :) DC -- -- http://mail.python.org/mailman/listinfo/python-list
Getting a TimedRotatingFileHandler not to put two dates in the same file?
We have a TimedRotatingFileHandler with when='midnight'. This works great, splitting the log information across files by date, as long as the process is actually up at midnight. But now the users have noticed that if the process isn't up at midnight, they can end up with lines from two (or I guess potentially more) dates in the same log file. Is there some way to fix this, either with cleverer arguments into the TimedRotatingFileHandler, or by some plausible subclassing of it or its superclass? Or am I misinterpreting the symptoms somehow? Tx much! DC -- http://mail.python.org/mailman/listinfo/python-list
Re: problem with ThreadingTCPServer Handler
jorge jaoro...@estudiantes.uci.cu I'm programming a server that most send a message to each client connected to it and nothing else. this is obviously a base of what i want to do. the thing is, I made a class wich contains the Handler class for the ThreadingTCPServer and starts the server but i don't know how can i access the message variable contained in the class from the Handler since I have not to instance the Handler by myself. The information about the request is in attributes of the Handler object. From the socketserver docs ( http://docs.python.org/library/socketserver.html ): RequestHandler.handle() This function must do all the work required to service a request. The default implementation does nothing. Several instance attributes are available to it; the request is available as self.request; the client address asself.client_address; and the server instance as self.server, in case it needs access to per-server information. If that's what you meant by the message variable contained in the class. If, on the other hand, you meant that you want to pass some specific data into the handler about what it's supposed to be doing, I've generally stashed that in the server, since the handler can see the server via self.server. DC -- http://mail.python.org/mailman/listinfo/python-list
Re: Py3.3 unicode literal and input()
If you (the programmer) want a function that asks the user to enter a literal at the input prompt, you'll have to write a post-processing for it, which looks for prefixes, for quotes, for backslashes, etc., and encodes the result. There very well may be such a decoder in the Python library, but input does nothing of the kind. As it says at the end of eval() (which you definitely don't want to use here due to side effects): See ast.literal_eval() for a function that can safely evaluate strings with expressions containing only literals. DC -- http://mail.python.org/mailman/listinfo/python-list
bus errors when the network interface is reset?
We have a system running Python 2.6.6 under RHEL 6.1. A bunch of processes spend most of their time sitting in a BaseHTTPServer.HTTPServer waiting for requests. Last night an update pushed out via xcat whimsically restarted all of the network interfaces, and at least some of our processes died with bus errors (i.e. no errors or exceptions reflected up to the Python level, just a crash). This is just my initial looking into this. Seeking opinions of the form, say: Yeah, that happens, don't reset the network interfaces. Yeah, that happens, and you can prevent the crash by doing X in your OS. Yeah, that happens, and you can prevent the crash by doing X in your Python code. That wouldn't happen if you upgraded S to version V. That sounds like a new bug and/or more information is needed; please provide copious details including at least X, Y, and Z. Any thoughts or advice greatly appreciated. DC David M. Chess IBM Watson Research Center -- http://mail.python.org/mailman/listinfo/python-list
[issue14704] NameError Issue in Multiprocessing
New submission from David M. Rogers dmr...@sandia.gov: Python Devs, There is an issue relating to variable lookup using exec from within multiprocessing's fork()-ed process. I'm attempting to use the forked process as a generic remote python shell, but exec is unable to reach variables from within functions. This issue makes it impossible to define a function which uses un-passed variables defined in the remote process. The simplest way to reproduce the error is: --- err.py --- from multiprocessing import Process, Pipe def run_remote(con, name): my_name = name for i in range(2): code = con.recv() exec code me, he = Pipe() p = Process(target=run_remote, args=(he, Sono Inglese de Gerrards Cross.)) p.start() me.send(print my_name) # works me.send( def show_name(): print my_name show_name() # doesn't work ) --- end err.py --- This program prints: $ python2.6 err.py Sono Inglese de Gerrards Cross. Process Process-1: Traceback (most recent call last): File /sw/lib/python2.6/multiprocessing/process.py, line 232, in _bootstrap self.run() File /sw/lib/python2.6/multiprocessing/process.py, line 88, in run self._target(*self._args, **self._kwargs) File err.py, line 7, in run_remote exec code File string, line 4, in module File string, line 3, in show_name NameError: global name 'my_name' is not defined I'm using Mac OSX (10.6.8) and Python 2.6.5 (r265:79063, Sep 23 2010, 14:05:02) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin The issue (with the same traceback) also occurs for: Python 2.7 (r27:82500, Sep 29 2010, 15:34:46) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Using exactly the same set of exec calls locally results in the correct behavior. --- noerr.py --- my_name = Sono Inglese de Gerrards Cross. exec print my_name exec def show_name(): print my_name show_name() --- end noerr.py --- -- components: None messages: 159764 nosy: frobnitzem priority: normal severity: normal status: open title: NameError Issue in Multiprocessing versions: Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue14704 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Windows 7 : any problems installing or running Python ?
Hello Skippy, In response to your message Windows 7 : any problems installing or running Python ? I found posted at (http://mail.python.org/pipermail/python-list/2009-August/1215524.html), I've got to say that I can't seem to get any version of Python to work on my computer. I have a Toshiba Satellite laptop running Windows 7 Home Premium with an AMD Turion II Dual-Core processor. I've tried all of the versions listed at http://python.org/download with no success. They install fine but when I try to run the IDLE (Python GUI), it does nothing at all. Do you have any suggestions that might help me out here? I would really appreciate your input. Thank you, David M Covey Sr. ad...@daffitt.com -- http://mail.python.org/mailman/listinfo/python-list
[issue4758] Python 3.x internet documentation needs work
David M. Beazley beaz...@users.sourceforge.net added the comment: An apology on the delay. Things have been rather hectic. Regarding a patch, I don't really have a patch so much as a suggested procedure. Basically, I'm suggesting that the maintainers of the library documentation simply do a quick survey of network related modules and make it clear that many of the operations work on byte strings and not strings. In Python 2.X, you could get away with being a little sloppy, but in Python 3, the bytes/strings distinction becomes much more prominent. If I have time, I might be able to make a specific patch, but it probably wouldn't be until after PyCON sometime. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4758 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7322] Socket timeout can cause file-like readline() method to lose data
New submission from David M. Beazley beaz...@users.sourceforge.net: Consider a socket that has had a file-like wrapper placed around it using makefile() # s is a socket created previously f = s.makefile() Now, suppose that this socket has had a timeout placed on it. s.settimeout(15) If you try to read data from f, but nothing is available. You'll eventually get a timeout. For example: f.readline() # Now, just wait Traceback (most recent call last): File stdin, line 1, in module File /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/socket. py, line 406, in readline data = self._sock.recv(self._rbufsize) socket.timeout: timed out However, now consider the case where you're reading a line of data, but the receiver has only received a partial line and it's waiting for the rest of the data to arrive. For example, type this: f.readline() Now, go to the other end of the socket connection and send a buffer with no newline character. For example, send the message Hello. Since no newline character has been received, the readline() method will eventually fail with a timeout as before. However, if you now retry the read operation f.readline() and send more data such as the message World\n, you'll find that the Hello message gets lost. In other words, the repeated readline() operation discards any buffers corresponding to previously received line data and just returns the new data. Admittedly this is a corner case, but you probably don't want data to be discarded on a TCP connection even if a timeout occurs. Hope that makes some sense :-). (It helps to try it out). -- components: Library (Lib) messages: 95245 nosy: beazley severity: normal status: open title: Socket timeout can cause file-like readline() method to lose data type: behavior versions: Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7322 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Compiler malware rebutted
On 11/13/2009 3:26 PM, Aahz wrote: Ken Thompson's classic paper on bootstrapped malware finally gets a rebuttal: http://lwn.net/Articles/360040/ thanks for pointing this out. -- david -- http://mail.python.org/mailman/listinfo/python-list
Re: (OT) Recommend FTP Client
On 11/12/2009 11:26 AM, Dave Angel wrote: Try http://fireftp.mozdev.org/ in the past i found this to be buggy. i'd recommend something different. what is your OS? -- david -- http://mail.python.org/mailman/listinfo/python-list
Re: Code that ought to run fast, but can't due to Python limitations.
Martin v. Löwis martin at v.loewis.de writes: This is a good test for Python implementation bottlenecks. Run that tokenizer on HTML, and see where the time goes. I looked at it with cProfile, and the top function that comes up for a larger document (52k) is ...validator.HTMLConformanceChecker.__iter__. [...] With this simple optimization, I get a 20% speedup on my test case. In my document, there are no attributes - the same changes should be made to attribute validation routines. I don't think this has anything to do with the case statement. I agree. I ran cProfile over just the tokenizer step; essentially tokenizer = html5lib.tokenizer.HTMLStream(htmldata) for tok in tokenizer: pass It mostly *isn't* tokenizer.py that's taking the most time, it's inputstream.py. (There is one exception: tokenizer.py:HTMLStream.__init__ constructs a dictionary of states each time -- this is unnecessary, replace all expressions like self.states[attributeName] with self.attributeNameState.) I've done several optimisations -- I'll upload the patch to the html5lib issue tracker. In particular, * The .position property of EncodingBytes is used a lot. Every self.position +=1 calls getPosition() and setPosition(). Another getPosition() call is done in the self.currentByte property. Most of these can be optimised away by using methods that move the position and return the current byte. * In HTMLInputStream, the current line number and column are updated every time a new character is read with .char(). The current position is *only* used in error reporting, so I reworked it to only calculate the position when .position() is called, by keeping track of the number of lines in previous read chunks, and computing the number of lines to the current offset in the current chunk. These give me about a 20% speedup. This just illustrates that the first step in optimisation is profiling :D As other posters have said, slurping the whole document into memory and using a regexp-based parser (such as pyparsing) would likely give you the largest speedups. If you want to keep the chunk- based approach, you can still use regexp's, but you'd have to think about matching on chunk boundaries. One way would be to guarantee a minimum number of characters available, say 10 or 50 (unless end-of-file, of course) -- long enough such that any *constant* string you'd want to match like ![CDATA[ would fit inside that guaranteed length. Any arbitrary-length tokens (such as attribute names and values) would be matched, not with regexps like [a-z]+, but with [a-z]{1,10} (match [a-z] from 1 to 10 times), and joining the individual matches together to make one token. Since html5lib has several implementations for several languages, it may actually be worth it to generate lexers for each language from one specification file. Take care, David M. Cooke david.m.co...@gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: itertools.intersect?
On Jun 11, 3:05 am, Chris Rebert c...@rebertia.com wrote: On Wed, Jun 10, 2009 at 5:53 PM, Mensanatormensana...@aol.com wrote: On Jun 10, 5:24 pm, David Wilson d...@botanicus.net wrote: Hi, During a fun coding session yesterday, I came across a problem that I thought was already solved by itertools, but on investigation it seems it isn't. The problem is simple: given one or more ordered sequences, return only the objects that appear in each sequence, without reading the whole set into memory. This is basically an SQL many-many join. Why not use SQL? Agreed. I seem to recall the last person asking for such a function wanted to use it to combine SQL results. My original use case was a full text indexer. Here's the code: http://code.google.com/p/ghetto-fts/ Let me invert the question and ask: why would I want to use SQL for this? Or in my own words: what kind of girly-code requires an RDBMS just to join some sequences? =) Given that Google reports 14.132 billion occurrences of the on the English web, which is about right, given that some estimate the English web at ~15 billion documents, or about 33.8 bits to uniquely identify each document, let's assume we use a 64bit integer, that's theoretically 111.7GiB of data loaded into SQL just for a single word. Introducing SQL quickly results in artificially requiring a database system, when a 15 line function would have sufficed. It also restricts how I store my data, and prevents, say, using a columnar, variable length, or delta encoding on my sequence of document IDs, which would massively improve the storage footprint (say, saving 48-56 bits per element). I'll avoid mention of the performance aspects altogether. What the hell are you thinking, David Cheers, Chris --http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
Re: itertools.intersect?
On Jun 11, 12:59 am, Jack Diederich jackd...@gmail.com wrote: On Wed, Jun 10, 2009 at 6:24 PM, David Wilsond...@botanicus.net wrote: During a fun coding session yesterday, I came across a problem that I thought was already solved by itertools, but on investigation it seems it isn't. The problem is simple: given one or more ordered sequences, return only the objects that appear in each sequence, without reading the whole set into memory. This is basically an SQL many-many join. I thought it could be accomplished through recursively embedded generators, but that approach failed in the end. After posting the question to Stack Overflow[0], Martin Geisler proposed a wonderfully succinct and reusable solution (see below, or pretty printed at the Stack Overflow URL). [snip] Here's my version; keep a list of (curr_val, iterator) tuples and operate on those. def intersect(seqs): iter_pairs = [(it.next(), it) for (it) in map(iter, seqs)] while True: min_val = min(iter_pairs)[0] max_val = max(iter_pairs)[0] if min_val == max_val: yield min_val max_val += 1 # everybody advances for i, (val, it) in enumerate(iter_pairs): if val max_val: iter_pairs[i] = (it.next(), it) # end while True This version is a lot easier to understand. The implicit StopIteration is a double-edged sword for readability, but I like it. :) David Interestingly you don't need to explicitly catch StopIteration and return because only the top level is a generator. So both lines where it.next() are called will potentially end the loop. I also tried using a defaultdict(list) as the main structure; it worked but was uglier by far { curr_val = [it1, it2, ..]} with dels and appends. -Jack ps, woops, I forgot to hit reply all the first time. -- http://mail.python.org/mailman/listinfo/python-list
Re: itertools.intersect?
On Jun 10, 11:24 pm, David Wilson d...@botanicus.net wrote: Hi, During a fun coding session yesterday, I came across a problem that I thought was already solved by itertools, but on investigation it seems it isn't. The problem is simple: given one or more ordered sequences, return only the objects that appear in each sequence, without reading the whole set into memory. This is basically an SQL many-many join. I thought it could be accomplished through recursively embedded generators, but that approach failed in the end. After posting the question to Stack Overflow[0], Martin Geisler proposed a wonderfully succinct and reusable solution (see below, or pretty printed at the Stack Overflow URL). It is my opinion that this particular implementation is a wonderful and incredibly valid use of iterators, and something that could be reused by others, certainly least not myself again in the future. With that in mind I thought it, or something very similar, would be a great addition to the itertools module. My question then is, are there better approaches to this? The heapq- based solution at the Stack Overflow page is potentially more useful still, for its ability to operate on orderless sequences, but in that case, it might be better to simply listify each sequence, and sort it before passing to the ordered-only functions. Thanks, David. Stack Overflow page here: http://stackoverflow.com/questions/969709/joining-a-set-of-ordered-in... Sweet solution: import operator def intersect(sequences): Compute intersection of sequences of increasing integers. list(intersect([[1, 100, 142, 322, 12312], ... [2, 100, 101, 322, 1221], ... [100, 142, 322, 956, 1222]])) [100, 322] iterators = [iter(seq) for seq in sequences] last = [iterator.next() for iterator in iterators] indices = range(len(iterators)) while True: # The while loop stops when StopIteration is raised. The # exception will also stop the iteration by our caller. if reduce(operator.and_, [l == last[0] for l in last]): # All iterators contain last[0] yield last[0] last = [iterator.next() for iterator in iterators] # Now go over the iterators once and advance them as # necessary. To stop as soon as the smallest iterator we # advance each iterator only once per loop iteration. for i in indices[:-1]: if last[i] last[i+1]: last[i] = iterators[i].next() if last[i] last[i+1]: last[i+1] = iterators[i+1].next() I found my answer: Python 2.6 introduces heap.merge(), which is designed exactly for this. Thanks all, David. -- http://mail.python.org/mailman/listinfo/python-list
Re: Concurrency Email List
On 5/16/2009 5:26 PM, Aahz wrote: On Sat, May 16, 2009, Pete wrote: python-concurre...@googlegroups.com is a new email list for discussion of concurrency issues in python. Is there some reason you chose not to create a list on python.org? I'm not joining the list because Google requires that you create a login. i too would join if it was hosted at python.org, and will not if it's hosted at google for the same reason. -- david -- http://mail.python.org/mailman/listinfo/python-list
[issue4903] binascii.crc32()
David M. Beazley beaz...@users.sourceforge.net added the comment: Placing a note in the standard library documentation would be a start. Just say in Python 3.0 it always returns the result as an unsigned integer whereas in Python 2.6 a 32-bit signed integer is returned. Although the numerical value may differ between versions, the underlying bits are the same. Use crc32() 0x to get a consistent value (already noted). Note: Not everyone uses checksums in only a packed-binary format. Having the integer value just change across Python versions like that is a real subtle compatibility problem to point out. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4903 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4903] binascii.crc32()
New submission from David M. Beazley beaz...@users.sourceforge.net: The result of binascii.crc32() is different on the same input in Python 2.6/3.0. For example: Python 2.6: binascii.crc32('Hello') -137262718 Python 3.0: binascii.crc32(b'Hello') 4157704578 -- components: Library (Lib) messages: 79524 nosy: beazley severity: normal status: open title: binascii.crc32() type: behavior versions: Python 2.6, Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4903 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4903] binascii.crc32()
David M. Beazley beaz...@users.sourceforge.net added the comment: Can someone PLEASE make sure this gets documented someplace. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4903 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4831] exec() behavior - revisited
David M. Beazley beaz...@users.sourceforge.net added the comment: One further followup just to make sure I'm clear. Is it always safe to pass the result of locals() into exec and extract the result as shown in my example? Since I'm writing about this in a book, I just want to make absolutely certain I know what's going on and that I don't tell people something that's completely bogus. Thanks! ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4831 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4831] exec() behavior - revisited
New submission from David M. Beazley beaz...@users.sourceforge.net: Please forgive me, but I'm really trying to wrap my brain around the behavior of exec() in Python 3. Here's a quote from the documentation: In all cases, if the optional parts are omitted, the code is executed in the current scope. This is referring to the optional use of the globals/locals parameters and seems to indicate that if they're omitted the code executes in the scope where the exec() appeared. Yet, this code fails: def foo(): exec(a = 42) print(a) # NameError: a Now, I realize that exec() became a function in Python 3. However, regardless of that, is it really the intent that exec() not be allowed to ever modify any local variable of a function? In other words, do I really have to do this? def foo(): ldict = locals() exec(a=42,globals(),ldict) a = ldict['a'] print(a) I submitted a bug report about this once before and it was immediately dismissed. I would appreciate some greater clarity on this matter this go around. Specifically, what is the approved way to have exec() modify the local environment of a function? -- components: Interpreter Core messages: 79059 nosy: beazley severity: normal status: open title: exec() behavior - revisited type: behavior versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4831 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4820] ctypes.util.find_library incorrectly documented
New submission from David M. Beazley beaz...@users.sourceforge.net: In the ctypes reference / Finding shared libraries section of the ctypes documentation, the find_library() function is described as being located in ctypes.util. However, it's formal description right below that lists it as ctypes.find_library(). -- assignee: georg.brandl components: Documentation messages: 78964 nosy: beazley, georg.brandl severity: normal status: open title: ctypes.util.find_library incorrectly documented versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4820 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4782] json documentation missing load(), loads()
New submission from David M. Beazley beaz...@users.sourceforge.net: Documentation for the json module in Python 2.6 and Python 3.0 doesn't have any description for load() or loads() even though both functions are used in the examples. -- assignee: georg.brandl components: Documentation messages: 78542 nosy: beazley, georg.brandl severity: normal status: open title: json documentation missing load(), loads() versions: Python 2.6, Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4782 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4783] json documentation needs a BAWM (Big A** Warning Message)
New submission from David M. Beazley beaz...@users.sourceforge.net: The json module is described as having an interface similar to pickle: json.dump() json.dumps() json.load() json.loads() I think it would be a WISE idea to add a huge warning message to the documentation that these functions should *NOT* be used to serialize or unserialize multiple objects on the same file stream like pickle. For example: f = open(stuff,w) json.dump(obj1, f) json.dump(obj2, f)# NO! FLAMING DEATH! f = open(stuff,r) obj1 = json.load(f) obj2 = json.load(f) # NO! EXTRA CRIPSY FLAMING DEATH! For one, it doesn't work. load() actually reads the whole file into a big string and tries to parse it as a single object. If there are multiple objects in the file, you get a nasty exeption. Second, I'm not even sure this is technically allowed by the JSON spec. As far as I call tell, concatenating JSON objects together in the same file falls into the same category as concatenating two HTML documents together in the same file (something you just don't do). Related: json.load() should probably not be used on any streaming input source such as a file wrapped around a socket. The first thing it does is consume the entire input by calling f.read()---which probably not what someone is expecting (and it might even cause the whole program to hang). -- assignee: georg.brandl components: Documentation messages: 78547 nosy: beazley, georg.brandl severity: normal status: open title: json documentation needs a BAWM (Big A** Warning Message) type: behavior versions: Python 2.6, Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4785] json.JSONDecoder() strict argument undocumented and potentially confusing
New submission from David M. Beazley beaz...@users.sourceforge.net: The strict parameter to JSONDecoder() is undocumented and is confusing because someone might assume it has something to do with the encoding parameter or the general handling of parsing errors (which it doesn't). As far as I can determine by reading the source, strict determines whether or not JSON strings are allowed to contain literal newlines in them or not. For example (note: loads() passes its parameters to JSONDecoder): s = '{test:Hello\nWorld}' print(s) {test:Hello World} json.loads(s) Traceback (most recent call last): ... File /tmp/lib/python3.0/json/decoder.py, line 159, in JSONString return scanstring(match.string, match.end(), encoding, strict) ValueError: Invalid control character at: line 1 column 14 (char 14) json.loads(s,strict=False) {'test': 'Hello\nWorld'} Note in this last example how the result has the literal newline embedded in it when strict is set False. -- assignee: georg.brandl components: Documentation messages: 78550 nosy: beazley, georg.brandl severity: normal status: open title: json.JSONDecoder() strict argument undocumented and potentially confusing type: behavior versions: Python 2.6, Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4785 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4783] json documentation needs a BAWM (Big A** Warning Message)
David M. Beazley beaz...@users.sourceforge.net added the comment: Just consider me to be an impartial outside reviewer. Hypothetically, let's say I'm a Python programmer who knows a thing or two about standard library modules (like pickle), but I'm new to JSON so I come looking at the json module documentation. The documentation tells me it uses the same interface as pickle and marshal (even naming those two modules right off the bat). So, right away, I'm thinking the module probably does all of the usual things that pickle and marshal can do. For instance, serializing multiple objects to the same stream. However, it doesn't work this way and the only way to find out that it doesn't work is to either try it and get an error, or to read the source code and figure it out. I'm not reporting this as an end-user of the json module, but as a Python book author who is trying to get things right and to be precise. I think if you're going to keep the pickle and marshal reference I would add the warning message. Otherwise, I wouldn't mention pickle or marshal at all. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4783] json documentation needs a BAWM (Big A** Warning Message)
David M. Beazley beaz...@users.sourceforge.net added the comment: Thanks! Hopefully I'm not giving you too much work to do :-). Cheers, Dave ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4786] xml.etree.ElementTree module name in Python 3
New submission from David M. Beazley beaz...@users.sourceforge.net: Not a bug, but a question to developers: Is xml.etree.ElementTree going to be the only standard library module in Python 3.0 that doesn't follow the standard Python 3.0 module naming conventions? (e.g., socketserver, configparser, etc.). Are there any plans to rename it to xml.etree.elementtree? Just curious. -- components: Library (Lib) messages: 78560 nosy: beazley severity: normal status: open title: xml.etree.ElementTree module name in Python 3 versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4786 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4766] email documentation needs to be precise about strings/bytes
New submission from David M. Beazley beaz...@users.sourceforge.net: Documentation for the email package needs to be more clear about the usage of strings and bytes. In particular: 1. All operations that parse email messages such as message_from_file() or message_from_string() operate on *text*, not binary data. So, the file must be opened in text mode. Strings must be text strings, not binary strings. 2. All operations that set/get the payload of a message operate on byte strings. For example, using m.get_payload() on a Message object returns binary data as a byte string. Opinion: There might be other bug reports about this, but I'm not advocating that the email module should support reading messages from binary mode files or byte strings. Email and MIME were originally developed with the assumption that messages would always be handled as text. Minimally, this assumed that messages would stay intact even if processed as 7-bit ASCII. By extension, everything should still work if processed as Unicode. So, I think the use of text-mode files is entirely consistent with this if you wanted to keep the module as is. There may be some confusion on this matter because if you're reading or writing email messages (or sending them across a socket), you may encounter messages stored in the form of bytes strings instead of text. People will then wonder why a byte string can't be parsed by this module (especially given that email messages only use character values in the range of 0-127). -- assignee: georg.brandl components: Documentation messages: 78456 nosy: beazley, georg.brandl severity: normal status: open title: email documentation needs to be precise about strings/bytes type: behavior versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4766 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4767] email.mime incorrectly documented (or implemented)
New submission from David M. Beazley beaz...@users.sourceforge.net: The documentation describes classes such as email.mime.MIMEText() email.mime.MIMEMultipart() email.mime.MIMEApplication() etc... However, it's confusing because none of these classes are actually found in email.mime. Suggest either using the full proper name: email.mime.text.MIMEText() Or just using the short name along with a note saying where it's found: MIMEText() Defined in email.mime.text. Further description, blah, blah.. Note: These classes *are* defined in email.mime in Python 2.6. -- assignee: georg.brandl components: Documentation messages: 78458 nosy: beazley, georg.brandl severity: normal status: open title: email.mime incorrectly documented (or implemented) type: behavior versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4767 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4768] email.generator.Generator object bytes/str crash - b64encode() bug?
New submission from David M. Beazley beaz...@users.sourceforge.net: The email.generator.Generator class does not work correctly message objects created with binary data (MIMEImage, MIMEAudio, MIMEApplication, etc.). For example: from email.mime.image import MIMEImage data = open(IMG.jpg,rb).read() m = MIMEImage(data,'jpeg') s = m.as_string() Traceback (most recent call last): File stdin, line 1, in module File /tmp/lib/python3.0/email/message.py, line 136, in as_string g.flatten(self, unixfrom=unixfrom) File /tmp/lib/python3.0/email/generator.py, line 76, in flatten self._write(msg) File /tmp/lib/python3.0/email/generator.py, line 101, in _write self._dispatch(msg) File /tmp/lib/python3.0/email/generator.py, line 127, in _dispatch meth(msg) File /tmp/lib/python3.0/email/generator.py, line 155, in _handle_text raise TypeError('string payload expected: %s' % type(payload)) TypeError: string payload expected: class 'bytes' The source of the problem is rather complicated, but here is the gist of it. 1. Classes such as MIMEAudio and MIMEImage accept raw binary data as input. This data is going to be in the form of bytes. 2. These classes immediately encode the data using a base64 encoder. This encoder uses the library function base64.b64encode(). 3. base64.b64encode() takes a byte string as input and returns a byte string as output. So, even after encoding, the payload of the message is of type 'bytes' 4. When messages are generated, the method Generator._dispatch() is used. It looks at the MIME main type and subtype and tries to dispatch message processing to a handler method of the form '_handle_type_subtype'.If it can't find such a handler, it defaults to a method _writeBody(). For image and audio types, this is what happens. 5. _writeBody() is an alias for _handle_text(). 6. _handle_text() crashes because it's not expecting a payload of type 'bytes'. Suggested fix: I think the library function base64.b64encode() should return a string, not bytes. The whole point of base64 encoding is to take binary data and encode it into characters safe for inclusion in text strings. Other fixes: Modify the Generator class in email.generator to properly detect bytes and use a different _handle function for it. For instance, maybe add a _handle_binary() method. -- components: Library (Lib) messages: 78464 nosy: beazley severity: normal status: open title: email.generator.Generator object bytes/str crash - b64encode() bug? type: crash versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4768 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4769] b64decode should accept strings or bytes
New submission from David M. Beazley beaz...@users.sourceforge.net: The whole point of base64 encoding is to safely encode binary data into text characters. Thus, the base64.b64decode() function should equally accept text strings or binary strings as input. For example, there is a reasonable expectation that something like this should work: x = 'SGVsbG8=' base64.b64decode(x) b'Hello' In Python 3, you get this exception however: base64.b64decode(x) Traceback (most recent call last): File stdin, line 1, in module File /tmp/lib/python3.0/base64.py, line 80, in b64decode raise TypeError(expected bytes, not %s % s.__class__.__name__) TypeError: expected bytes, not str I realize that there are encoding issues with Unicode strings, but base64 encodes everything into the first 127 ASCII characters. If the input to b64decode is a str, just do a encode('ascii') operation on it and proceed. If that fails, it wasn't valid Base64 to begin with. I can't think of any real negative impact to making this change as long as the result is still always bytes. The main benefit is just simplifying the decoding process for end-users. See issue 4768. -- components: Library (Lib) messages: 78466 nosy: beazley severity: normal status: open title: b64decode should accept strings or bytes type: behavior versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4769 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4769] b64decode should accept strings or bytes
David M. Beazley beaz...@users.sourceforge.net added the comment: Note: This problem applies to all of the other decoders/encoders in the base64 too (b16, b32, etc.) ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4769 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4770] binascii module, crazy error messages, unexpected behavior
New submission from David M. Beazley beaz...@users.sourceforge.net: See Issue 4869 for a related bug. Most of the functions in binascii are meant to go from binary data to textual representations (hex digits, base64, binhex, etc.). There are numerous problems: 1. Misleading error messages. For example: binascii.b2a_base64(Some text) Traceback (most recent call last): File stdin, line 1, in module TypeError: b2a_base64() argument 1 must be string or buffer, not str binascii.crc32(Some text) Traceback (most recent call last): File stdin, line 1, in module TypeError: crc32() argument 1 must be string or buffer, not str Huh? Didn't I just pass a string? Error message should say argument 1 must be bytes or buffer, not str. This problem shows up with most of the other encoding functions too (i.e., b2a_uu). 2. Expected behavior with encoding/decoding. The functions in this module are going from binary to text. To be consistent, I think the result of encoding operations such as b2a_uu(), b2a_base64(), should be strings, not bytes. Similarly, decoding operations are going from text back to bytes. I think the input arguments should be allowed to be str (in addition to bytes if you want). -- components: Library (Lib) messages: 78470 nosy: beazley severity: normal status: open title: binascii module, crazy error messages, unexpected behavior type: behavior versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4770 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4770] binascii module, crazy error messages, unexpected behavior
David M. Beazley beaz...@users.sourceforge.net added the comment: Given the low-level nature of this module, I can understand the motivation to make it all bytes. However, I'm going to respectfully disagree with that and claim that making binascii all bytes really goes against the whole spirit of what Python 3.0 has tried to do for Unicode. For example, throughout Python, you now have a clean separation between binary data (bytes) and text data (str). Well, it's cleanly separated everywhere except in the binascii module (and base64 module) which, ironically, is all about converting between binary data and text. As it stands now, it's a huge wart IMHO. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4770 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4771] Bad examples in hashlib documentation
New submission from David M. Beazley beaz...@users.sourceforge.net: The hashlib documentation has incorrect examples showing the use of the hexdigest() method: hashlib.sha224(bNobody inspects the spammish repetition).hexdigest() b'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2' and this one h = hashlib.new('ripemd160') h.update(bNobody inspects the spammish repetition) h.hexdigest() b'cc4a5ce1b3df48aec5d22d1f16b894a0b894eccc' However, the result of h.hexdigest() is of type 'str', not bytes. Actual output: h.hexdigest() 'cc4a5ce1b3df48aec5d22d1f16b894a0b894eccc' Sure would be nice if that string of hex digits was easy to decode back into a binary string. import binascii b = binascii.a2b_hex(h.hexdigest()) Hmmm. So *SOME* of the functions in binascii do accept Unicode strings. See Issue 4470 :-). -- assignee: georg.brandl components: Documentation messages: 78480 nosy: beazley, georg.brandl severity: normal status: open title: Bad examples in hashlib documentation type: behavior versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4771 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4771] Bad examples in hashlib documentation
David M. Beazley beaz...@users.sourceforge.net added the comment: The digest() method of hashes does produce bytes (correct). The hexdigest() method produces a string, but it is also shown as producing bytes in the examples. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4771 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4773] HTTPMessage not documented and has inconsistent API across 2.6/3.0
New submission from David M. Beazley beaz...@users.sourceforge.net: A file-like object u returned by the urlopen() function in both Python 2.6/3.0 has a method info() that returns a 'HTTPMessage' object. For example: ::: Python 2.6 from urllib2 import urlopen u = urlopen(http://www.python.org;) u.info() httplib.HTTPMessage instance at 0xce5738 ::: Python 3.0 from urllib.request import urlopen u = urlopen(http://www.python.org;) u.info() http.client.HTTPMessage object at 0x4bfa10 So far, so good. HTTPMessage is defined in two different modules, but that's fine (it's just library reorganization). Two major problems: 1. There is no documentation whatsoever on HTTPMessage. No description in the docs for httplib (python 2.6) or http.client (python 3.0). 2. The HTTPMessage object in Python 2.6 derives from mimetools.Message and has a totally different programming interface than HTTPMessage in Python 3.0 which derives from email.message.Message. Check it out: :::Python 2.6 dir(u.info()) ['__contains__', '__delitem__', '__doc__', '__getitem__', '__init__', '__iter__', '__len__', '__module__', '__setitem__', '__str__', 'addcontinue', 'addheader', 'dict', 'encodingheader', 'fp', 'get', 'getaddr', 'getaddrlist', 'getallmatchingheaders', 'getdate', 'getdate_tz', 'getencoding', 'getfirstmatchingheader', 'getheader', 'getheaders', 'getmaintype', 'getparam', 'getparamnames', 'getplist', 'getrawheader', 'getsubtype', 'gettype', 'has_key', 'headers', 'iscomment', 'isheader', 'islast', 'items', 'keys', 'maintype', 'parseplist', 'parsetype', 'plist', 'plisttext', 'readheaders', 'rewindbody', 'seekable', 'setdefault', 'startofbody', 'startofheaders', 'status', 'subtype', 'type', 'typeheader', 'unixfrom', 'values'] :::Python 3.0 dir(u.info()) ['__class__', '__contains__', '__delattr__', '__delitem__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_charset', '_default_type', '_get_params_preserve', '_headers', '_payload', '_unixfrom', 'add_header', 'as_string', 'attach', 'defects', 'del_param', 'epilogue', 'get', 'get_all', 'get_boundary', 'get_charset', 'get_charsets', 'get_content_charset', 'get_content_maintype', 'get_content_subtype', 'get_content_type', 'get_default_type', 'get_filename', 'get_param', 'get_params', 'get_payload', 'get_unixfrom', 'getallmatchingheaders', 'is_multipart', 'items', 'keys', 'preamble', 'replace_header', 'set_boundary', 'set_charset', 'set_default_type', 'set_param', 'set_payload', 'set_type', 'set_unixfrom', 'values', 'walk'] I know that getting rid of mimetools was desired, but I have no idea if changing the API on HTTPMessage was intended or not. In any case, it's one of the only cases in the entire library where the programming interface to an object radically changes from 2.6 - 3.0. I ran into this problem with code that was trying to properly determine the charset encoding of the byte string returned by urlopen(). I haven't checked whether 2to3 deals with this or not, but it might be something for someone to look at in their copious amounts of spare time. -- components: Library (Lib) messages: 78486 nosy: beazley severity: normal status: open title: HTTPMessage not documented and has inconsistent API across 2.6/3.0 type: behavior versions: Python 2.6, Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4773 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4773] HTTPMessage not documented and has inconsistent API across 2.6/3.0
David M. Beazley beaz...@users.sourceforge.net added the comment: Verified that 2to3 does not fix this. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4773 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1194378] sendmsg() and recvmsg() for C socket module
David M. Beazley beaz...@users.sourceforge.net added the comment: Just a followup comment to note that adding support for sendmsg()/recvmsg() is what you need to do file descriptor passing between processes on Unix---another technique for writing network servers. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1194378 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4758] Python 3.0 internet documentation needs work
New submission from David M. Beazley beaz...@users.sourceforge.net: I have recently completed a pretty thorough survey of library documentation for Python 3.0 in conjunction with an update I'm making to my book. This issue is not so much a bug as a documentation request. For all of the library modules related to network programming, it would be extremely useful to be much very explicit about what methods work with strings and what methods requires byte. So many of these modules operate on small fragments of data (e.g., send a request, add a header, parse a query string, etc.). Sometimes using a string is okay, sometimes it's not and sadly, it's not often predictable. Part of the problem is that the documentation has been written for a Python 2 world where text strings and binary data were interchangeable. Anyways, this request minimally covers these modules: ftplib smtplib nntplib http.* urllib.* xmlrpc.* socketserver asynchat asyncore If there is interest, I can submit more detailed notes from my own work, but I'm not sure how the documentation maintainer would want this. Separate issue for each? Added as comments here? Please advise. -- assignee: georg.brandl components: Documentation messages: 78361 nosy: beazley, georg.brandl severity: normal status: open title: Python 3.0 internet documentation needs work type: feature request versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4758 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1194378] sendmsg() and recvmsg() for C socket module
David M. Beazley beaz...@users.sourceforge.net added the comment: Bump. This functionality seems to be needed if anyone is going to be messing around with advanced features of IPv6. As it stands, the socket module in Python 2.6/3.0 is incomplete without this. -- nosy: +beazley ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1194378 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4744] asynchat documentation needs to be more precise
New submission from David M. Beazley beaz...@users.sourceforge.net: The documentation for asynchat needs to be more precise in its use of strings vs. bytes. Unless the undocumented use_encoding attribute is set, it seems that all data should be bytes throughout (e.g., the terminator, inputs to push methods, etc.). I have no idea if the use_encoding attribute is officially blessed or not. However, to avoid magic behavior, I'm guessing that it would be better practice to be explicit in one's use of bytes vs. text rather than having take place in the internals of asynchat. Advice welcome. -- assignee: georg.brandl components: Documentation messages: 78277 nosy: beazley, georg.brandl severity: normal status: open title: asynchat documentation needs to be more precise versions: Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4744 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4694] _call_method() in multiprocessing documentation
New submission from David M. Beazley beaz...@users.sourceforge.net: The documentation for Proxy Objects in the multiprocessing module describes a method _call_method and gives various examples. The only problem is that the method is actually called _callmethod (i.e., no underscore between call and method). -- assignee: georg.brandl components: Documentation messages: 78038 nosy: beazley, georg.brandl severity: normal status: open title: _call_method() in multiprocessing documentation type: behavior versions: Python 2.6, Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4694 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4694] _call_method() in multiprocessing documentation
David M. Beazley beaz...@users.sourceforge.net added the comment: The _get_value() method is also in error. It's called _getvalue() in the source code. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4694 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4695] Bad AF_PIPE address in multiprocessing documentation
New submission from David M. Beazley beaz...@users.sourceforge.net: In the Address Formats part of the Listeners and Clients section of the documentation for the multiprocessing module, AF_PIPE addresses are described as having this format: r'ServerName\\pipe\\PipeName' However, it is really this: r'\\ServerName\pipe\PipeName' Be careful with raw strings. The documentation is showing the output of repr(), not a properly formed raw string. I verified this fix on Windows XP. -- assignee: georg.brandl components: Documentation messages: 78041 nosy: beazley, georg.brandl severity: normal status: open title: Bad AF_PIPE address in multiprocessing documentation versions: Python 2.6, Python 3.0 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4695 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4686] Exceptions in ConfigParser don't set .args
New submission from David M. Beazley beaz...@users.sourceforge.net: The ConfigParser module defines a variety of custom exceptions, many of which take more than one argument (e.g., InterpolationError, NoOptionError, etc.). However, none of these exceptions properly set the .args attribute. For example, shouldn't NoOptionError be defined as follows: class NoOptionError(Error): def __init__(self,option,section): Error.__init__(self,No option %r in section: %r % (option,section)) self.option = option self.section = section self.args = (option,section) #!! Added this line This is kind of a minor point, but the missing args means that these exceptions don't work properly with programs that need to do fancy kinds of exception processing (i.e., catching errors and reraising them in a different context or process). I can't speak for Python 3.0, but it's my understanding that .args should always be set to the exception arguments. Don't ask how I came across this---it was the source of a really bizarre bug in something I was playing around with. -- components: Library (Lib) messages: 77983 nosy: beazley severity: normal status: open title: Exceptions in ConfigParser don't set .args type: behavior versions: Python 2.1.1, Python 2.1.2, Python 2.2, Python 2.2.1, Python 2.2.2, Python 2.2.3, Python 2.3, Python 2.4, Python 2.5, Python 2.5.3, Python 2.6, Python 2.7, Python 3.0, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4686 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4561] Optimize new io library
David M. Beazley beaz...@users.sourceforge.net added the comment: I wish I shared your optimism about this, but I don't. Here's a short explanation why. The problem of I/O and the associated interface between hardware, the operating system kernel, and user applications is one of the most fundamental and carefully studied problems in all of computer systems. The C library and its associated I/O functionality provide the user- space implementation of this interface. However, if you peel the covers off of the C library, you're going to find a lot of really hairy stuff in there. Examples might include: 1. Low-level optimization related to the system hardware (processor architecture, caching, I/O bus, etc.). 2. Hand-written finely tuned assembly code. 3. Low-level platform-specific system calls such as ioctl(). 4. System calls related to shared memory regions, kernel buffers, etc. (i.e., optimizations that try to eliminate buffer copies). 5. Undocumented vendor-specific proprietary system calls (i.e., unknown magic). So, you'll have to forgive me for being skeptical, but I just don't think any programmer is going to sit down and bang out a new implementation of buffered I/O that is going to match the performance of what's provided by the C library. Again, I would love to be proven wrong. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4561 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4561] Optimize new io library
David M. Beazley beaz...@users.sourceforge.net added the comment: Good luck with that. Most people who get bright ideas such as gee, maybe I'll write my own version of X where X is some part of the standard C library pertaining to I/O, end up fighting a losing battle. Of course, I'd love to be proven wrong, but I don't think I will in this case. As for cranking data, that does not necessarily imply heavy-duty CPU processing. Someone might be reading large datafiles simply to perform some kind of data extraction, filtering, minor translation, or other operation that is easily carried out in Python, but where the programs are still I/O bound. For example, the kinds of processing one might otherwise do using awk, sed, perl, etc. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4561 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4561] Optimize new io library
David M. Beazley beaz...@users.sourceforge.net added the comment: I agree with Raymond. For binary reads, I'll go farther and say that even a 10% slowdown in performance would be surprising if not unacceptable to some people. I know that as hard as it might be for everyone to believe, there are a lot of people who crank lots of non- Unicode data with Python. In fact, Python 2.X is pretty good at it. It's fine that text mode now uses Unicode, but if I don't want that, I would certainly expect the binary file modes to run at virtually the same speed as Python 2 (e.g., okay, they work with bytes instead of strings, but is the bytes type really all that different from the old Python 2 str type?). ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4561 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4593] Documentation for multiprocessing - Pool.apply()
New submission from David M. Beazley [EMAIL PROTECTED]: The documentation for the apply() and apply_async() methods of a Pool object might emphasize that these operations execute func(*args,**kwargs) in only one of the pool workers and that func() is not being executed in parallel on all workers.On first reading, I actually thought it might be the latter (and had to do some experimentation to figure out what was actually happening). -- assignee: georg.brandl components: Documentation messages: 77312 nosy: beazley, georg.brandl severity: normal status: open title: Documentation for multiprocessing - Pool.apply() versions: Python 2.6, Python 2.7, Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4593 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4593] Documentation for multiprocessing - Pool.apply()
David M. Beazley [EMAIL PROTECTED] added the comment: Actually, you shouldn't discount the potential usefulness of running apply() in all of the worker nodes. A lot of people coming from parallel programming know about things like global broadcasts, reductions, and so forth. For example, if I wanted to perform a global operation (maybe some kind of configuration) on all workers, I could see doing some kind of global apply() operation to do it. That said, I'm not actually asking for any new functionality. I'd just make it more clear that apply() is not performing a function call on all pool workers. Also, given that apply() blocks, I'm not exactly sure how useful it is in the context of actually performing work in parallel. You might want to emphasize that apply_async() is better suited for that (the only other way I could think of to take advantage of apply() in parallel would be to call it from separate threads in the process that created the pool). ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4593 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4561] Optimize new io library
David M. Beazley [EMAIL PROTECTED] added the comment: I've done some profiling and the performance of reading line-by-line is considerably worse in Python 3 than in Python 2. For example, this code: for line in open(somefile.txt): pass Ran 35 times slower in Python 3.0 than Python 2.6 when I tested it on a big text file (100 Megabytes). If you disable Unicode by opening the file in binary mode, it runs even slower. This slowdown is really unacceptable for anyone who uses Python for parsing big non-Unicode text files (and I would claim that there are many such people). -- nosy: +beazley ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4561 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4561] Optimize new io library
David M. Beazley [EMAIL PROTECTED] added the comment: Tried this using projects/python/branches/release30-maint and using the patch that was just attached. With a 66MB input file, here are the results of this code fragment: for line in open(BIGFILE): pass Python 2.6: 0.67s Python 3.0: 32.687s (48 times slower) This is running on a MacBook with a warm disk cache. For what it's worth, I didn't see any improvement with that patch. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4561 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4561] Optimize new io library
David M. Beazley [EMAIL PROTECTED] added the comment: Just as one other followup, if you change the code in the last example to use binary mode like this: for line in open(BIG,rb): pass You get the following results: Python 2.6: 0.64s Python 3.0: 42.26s (66 times slower) ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4561 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4561] Optimize new io library
David M. Beazley [EMAIL PROTECTED] added the comment: Just checked it with branches/py3k and the performance is the same. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4561 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com