Re: japanese encoding iso-2022-jp in python vs. perl
On Oct 23, 3:37 am, kettle [EMAIL PROTECTED] wrote: Hi, I am rather new to python, and am currently struggling with some encoding issues. I have some utf-8-encoded text which I need to encode as iso-2022-jp before sending it out to the world. I am using python's encode functions: -- var = var.encode(iso-2022-jp, replace) print var -- I am using the 'replace' argument because there seem to be a couple of utf-8 japanese characters which python can't correctly convert to iso-2022-jp. The output looks like this: ↓東京???日比谷線?北千住行 However if use perl's encode module to re-encode the exact same bit of text: -- $var = encode(iso-2022-jp, decode(utf8, $var)) print $var -- I get proper output (no unsightly question-marks): ↓東京メトロ日比谷線・北千住行 So, what's the deal? Thanks that I have my crystal ball working. I can see clearly that the forth character of the input is 'HALFWIDTH KATAKANA LETTER ME' (U+FF92) which is not present in ISO-2022-JP as defined by RFC 1468 so python converts it into question mark as you requested. Meanwhile perl as usual is trying to guess what you want and silently converts that character into 'KATAKANA LETTER ME' (U+30E1) which is present in ISO-2022-JP. Why can't python properly encode some of these characters? Because Explicit is better than implicit. Do you care about roundtripping? Do you care about width of characters? What about full-width (U +FF02)? Python doesn't know answers to these questions so it doesn't do anything with your input. You have to do it yourself. Assuming you don't care about roundtripping and width here is an example demonstrating how to deal with narrow characters: from unicodedata import normalize iso2022_squeezing = dict((i, normalize('NFKC',unichr(i))) for i in range(0xFF61,0xFFE0)) print repr(u'\uFF92'.translate(iso2022_squeezing)) It prints u'\u30e1'. Feel free to ask questions if something is not clear. Note, this is just an example, I *don't* claim it does what you want for any character in FF61-FFDF range. You may want to carefully review the whole unicode block: http://www.unicode.org/charts/PDF/UFF00.pdf -- Leo. -- http://mail.python.org/mailman/listinfo/python-list
Re: Portable general timestamp format, not 2038-limited
On Jun 27, 10:51 pm, Paul Rubin http://[EMAIL PROTECTED] wrote: The difficulty/impossibility of computing intervals on UTC because of leap seconds suggests TAI is a superior timestamp format. If you care about intervals you'd better keep timestamps in SI seconds since some zero time point (just like OP wanted). TAI timestamps are pretty useless IMHO. They need to be converted to decimal/float for interval calculations and they don't represent any legal time. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: String formatting for complex writing systems
On Jun 27, 12:20 am, Andy [EMAIL PROTECTED] wrote: Hi guys, I'm writing a piece of software for some Thai friend. At the end it is supposed to print on paper some report with tables of text and numbers. When I test it in English, the columns are aligned nicely, but when he tests it with Thai data, the columns are all crooked. The problem here is that in the Thai writing system some times two or more characters together might take one single space, for example งิ (u\u0E07\u0E34). This is why when I use something like u%10s % ..., it just doesn't work as expected. Is anybody aware of an alternative string format function that can deal with this kind of writing properly? In general case it's impossible to write such a function for many unicode characters without feedback from rendering library. Assuming you use *fixed* font for English and Thai the following function will return how many columns your text will use: from unicodedata import category def columns(self, s): return sum(1 for c in s if category(c) != 'Mn') -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: String formatting for complex writing systems
On Jun 27, 3:10 am, Leo Kislov [EMAIL PROTECTED] wrote: On Jun 27, 12:20 am, Andy [EMAIL PROTECTED] wrote: Hi guys, I'm writing a piece of software for some Thai friend. At the end it is supposed to print on paper some report with tables of text and numbers. When I test it in English, the columns are aligned nicely, but when he tests it with Thai data, the columns are all crooked. The problem here is that in the Thai writing system some times two or more characters together might take one single space, for example งิ (u\u0E07\u0E34). This is why when I use something like u%10s % ..., it just doesn't work as expected. Is anybody aware of an alternative string format function that can deal with this kind of writing properly? In general case it's impossible to write such a function for many unicode characters without feedback from rendering library. Assuming you use *fixed* font for English and Thai the following function will return how many columns your text will use: from unicodedata import category def columns(self, s): return sum(1 for c in s if category(c) != 'Mn') That should of course be written as def columns(s). Need to learn to proofread before posting :) -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Method much slower than function?
On Jun 13, 5:40 pm, [EMAIL PROTECTED] wrote: Hi all, I am running Python 2.5 on Feisty Ubuntu. I came across some code that is substantially slower when in a method than in a function. cProfile.run(bar.readgenome(open('cb_foo'))) 20004 function calls in 10.214 CPU seconds cProfile.run(z=r.readgenome(open('cb_foo'))) 20004 function calls in 0.041 CPU seconds I suspect open files are cached so the second reader picks up where the first one left: at the of the file. The second call doesn't do any text processing at all. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: How to wrap a Japanese text in Python
On Jun 7, 5:12 am, [EMAIL PROTECTED] wrote: Hi All, I am trying to wrap a japanese text in Python, by the following code. if len(message) 54: message = message.decode(UTF8) strlist = textwrap.wrap(message,54) After this I am wirting it to you a CAD Software window. While displaying in this window some Japanese characters at the end of the line some at the begining of the line are not displayed at all. Meaning the text wrapping is not happening correctly. Can any body please help me out in resolving this problem. First of all you should move message.decode('utf-8') call out of if and you don't need if anyway because if the line is less than 54 textwrap won't touch it: message = message.decode('utf-8') strlist = textwrap.wrap(message, 54) I don't know Japanese but the following example *seems* to work fine for me: # -*- coding: utf-8 -*- sample=u import textwrap for line in textwrap.wrap(sample, 6): print line Result: Can you post a short example that clearly demonstrates the problem? -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: How to wrap a Japanese text in Python
On Jun 8, 2:24 am, Leo Kislov [EMAIL PROTECTED] wrote: On Jun 7, 5:12 am, [EMAIL PROTECTED] wrote: Hi All, I am trying to wrap a japanese text in Python, by the following code. if len(message) 54: message = message.decode(UTF8) strlist = textwrap.wrap(message,54) After this I am wirting it to you a CAD Software window. While displaying in this window some Japanese characters at the end of the line some at the begining of the line are not displayed at all. Meaning the text wrapping is not happening correctly. Can any body please help me out in resolving this problem. First of all you should move message.decode('utf-8') call out of if and you don't need if anyway because if the line is less than 54 textwrap won't touch it: message = message.decode('utf-8') strlist = textwrap.wrap(message, 54) I don't know Japanese but the following example *seems* to work fine for me: # -*- coding: utf-8 -*- sample=u import textwrap for line in textwrap.wrap(sample, 6): print line Result: Oh, my. IE7 and/or Google groups ate my Japanese text :( But I hope you've got the idea: try to work on a small example python program in a unicode-friendly IDE like for example IDLE. Can you post a short example that clearly demonstrates the problem? This question is still valid. -- Leo. -- http://mail.python.org/mailman/listinfo/python-list
Re: Python memory handling
On May 31, 8:06 am, [EMAIL PROTECTED] wrote: Hello, I will try later with python 2.5 under linux, but as far as I can see, it's the same problem under my windows python 2.5 After reading this document :http://evanjones.ca/memoryallocator/python-memory.pdf I think it's because list or dictionnaries are used by the parser, and python use an internal memory pool (not pymalloc) for them... If I understand the document correctly you should be able to free list and dict caches if you create more than 80 new lists and dicts: [list(), dict() for i in range(88)] If it doesn't help that means 1) listdict caches don't really work like I think or 2) pymalloc cannot return memory because of fragmentation and that is not simple to fix. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: getmtime differs between Py2.5 and Py2.4
On May 7, 4:15 pm, Irmen de Jong [EMAIL PROTECTED] wrote: Martin v. Löwis wrote: Is this a bug? Why don't you read the responses posted earlier? John Machin replied (in [EMAIL PROTECTED]) that you are mistaken: There is NO difference between the outcome of os.path.getmtime between Py2.5 and Py2.4. It always did return UTC, and always will. Regards, Martin Err.: [E:\Projects]dir *.py Volume in drive E is Data Serial number is 2C4F:9C2D Directory of E:\Projects\*.py 31-03-2007 20:46 511 log.py 25-11-2006 16:59 390 p64.py 7-03-2007 23:07 207 sock.py 3-02-2007 16:15 436 threads.py 1.544 bytes in 4 files and 0 dirs16.384 bytes allocated 287.555.584 bytes free [E:\Projects]c:\Python24\python.exe -c import os; print os.path.getmtime('p64.py') 1164470381 [E:\Projects]c:\Python25\python.exe -c import os; print os.path.getmtime('p64.py') 1164466781.28 This is python 2.4.4 and Python 2.5.1 on windows XP. The reported time clearly differs. Let me guess: your E drive uses FAT filesystem? -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: invoke user's standard mail client
On May 7, 2:00 pm, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: On May 7, 10:28 am, Gabriel Genellina [EMAIL PROTECTED] wrote: Get the pywin32 package (Python for Windows extensions) from sourceforge, install it, and look into the win32comext\mapi\demos directory. Thanks for the hint, Gabriel. Wow, that's heavily spiced code! When I invoke mapisend.py I get: Traceback (most recent call last): File mapisend1.py, line 85, in module SendEMAPIMail(SendSubject, SendMessage, SendTo, MAPIProfile=MAPIProfile) File mapisend1.py, line 23, in SendEMAPIMail mapi.MAPIInitialize(None) pywintypes.com_error: (-2147467259, 'Unspecified error', None, None) But what is a MAPI profile? It's an abstraction of incoming and outgoing mail accounts. In UNIX terms it's kind of like running local sendmail that forwards mail to another server and fetchmail that fetches mail from external inboxes, e.g. it's a proxy between you and outgoing/incoming mail server. I left this variable blank. Do I need MS Exchange Server to run this demo? No, but you need an account on some mail server and some email program should create a MAPI profile to represent that account on your local computer. As I understand creation of MAPI profiles is not a common practice among non-Microsoft products, for example my computer with Lotus Notes doesn't have any MAPI profiles. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: relative import broken?
On May 3, 10:08 am, Alan Isaac [EMAIL PROTECTED] wrote: Alex Martelli [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Very simply, PEP 328 explains: Relative Imports and __name__ Relative imports use a module's __name__ attribute to determine that module's position in the package hierarchy. If the module's name does not contain any package information (e.g. it is set to '__main__') then relative imports are resolved as if the module were a top level module, regardless of where the module is actually located on the file system. To change my question somewhat, can you give me an example where this behavior (when __name__ is '__main__') would be useful for a script? (I.e., more useful than importing relative to the directory holding the script, as indicated by __file__.) Do you realize it's a different behaviour and it won't work for some packages? One possible alternative is to assume empty parent package and let from . import foo work but not from .. import bar or any other upper levels. The package author should also realize __init__.py will be ignored. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: hp 11.11 64 bit python 2.5 build gets error import site failed
On May 3, 2:54 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: import site failed OverflowError: signed integer is greater than the maximum. - what is the value of ival? ival: 4294967295 I see. This is 0x, which would be -1 if it were of type int. So perhaps some value got cast incorrectly at some point, breaking subsequent computations - where does that number come from? It is coming from the call to PyInt_AsLong. In that function there is a call to: PyInt_AS_LONG((PyIntObject*)op) which returns the value of ival. That was not my question, really. I wanted to know where the object whose AsLong value was taken came from. And before you say it's in the arg parameter of convertsimple() - sure it is. However, how did it get there? It's in an argument tuple - and where came that from? Looking at the call stack OP posted, -1 is coming as forth parameter of __import__, I *guess* at the first import in site.py or at implicit import site. I think it'd be helpful if OP also tried if it works: python -S -c -v print -1, type(-1), id(0), id(-1) -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: My Python annoyances
On May 3, 9:27 pm, Gabriel Genellina [EMAIL PROTECTED] wrote: En Thu, 03 May 2007 10:49:26 -0300, Ben Collver [EMAIL PROTECTED] escribió: I tried to write portable Python code. The zlib CRC function returned different results on architectures between 32 bit and 64 bit architectures. I filed a bug report. It was closed, without a comment from the person who closed it. I get the unspoken message: bug reports are not welcome. You got a comment from me, that you never disputed nor commented further. I would have changed the status to invalid myself, if I were able to do so. I think it should have been marked as won't fix as it's a wart just like 1/2 == 0, but as there are many users of the current behaviour it's impossible to fix it in Python 2.x. Maybe in Python 3.0? -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Python's handling of unicode surrogates
On Apr 20, 7:34 pm, Rhamphoryncus [EMAIL PROTECTED] wrote: On Apr 20, 6:21 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: If you absolutely think support for non-BMP characters is necessary in every program, suggesting that Python use UCS-4 by default on all systems has a higher chance of finding acceptance (in comparison). I wish to write software that supports Unicode. Like it or not, Unicode goes beyond the BMP, so I'd be lying if I said I supported Unicode if I only handled the BMP. Having ability to iterate over code points doesn't mean you support Unicode. For example if you want to determine if a string is one word and you iterate over code points and call isalpha you'll get incorrect result in some cases in some languages (just to backup this claim this isn't going to work at least in Russian. Russian language uses U+0301 combining acute accent which is not part of the alphabet but it's an element of the Russian writing system). IMHO what is really needed is a bunch of high level methods like .graphemes() - iterate over graphemes .codepoints() - iterate over codepoints .isword() - check if the string represents one word etc... Then you can actually support all unicode characters in utf-16 build of Python. Just make all existing unicode methods (except unicode.__iter__) iterate over code points. Changing __iter__ to iterate over code points will make indexing wierd. When the programmer is *ready* to support unicode he/she will explicitly call .codepoints() or .graphemes(). As they say: You can lead a horse to water, but you can't make it drink. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: iterator interface for Queue?
On Apr 7, 11:40 pm, Paul Rubin http://[EMAIL PROTECTED] wrote: Is there any reason Queue shouldn't have an iterator interface? I.e. instead of while True: item = work_queue.get() if item is quit_sentinel: # put sentinel back so other readers can find it work_queue.put(quit_sentinel) break process(item) It's almost equal to: for item in iter(work_queue.get, quit_sentinel): process(item) except that it doesn't keep the quit sentinel in the queue. But that's a personal preference, I usually put as many quit sentinels in a queue as many consumers. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: I18n issue with optik
On Apr 1, 8:47 am, Thorsten Kampe [EMAIL PROTECTED] wrote: I guess the culprit is this snippet from optparse.py: # used by test suite def _get_encoding(self, file): encoding = getattr(file, encoding, None) if not encoding: encoding = sys.getdefaultencoding() return encoding def print_help(self, file=None): print_help(file : file = stdout) Print an extended help message, listing all options and any help text provided with them, to 'file' (default stdout). if file is None: file = sys.stdout encoding = self._get_encoding(file) file.write(self.format_help().encode(encoding, replace)) So this means: when the encoding of sys.stdout is US-ASCII, Optparse sets the encoding to of the help text to ASCII, too. .encode() method doesn't set an encoding. It encodes unicode text into bytes according to specified encoding. That means optparse needs ascii or unicode (at least) for help text. In other words you'd better use unicode throughout your program. But that's nonsense because the Encoding is declared in the Po (localisation) file. For backward compatibility gettext is working with bytes by default, so the PO file encoding is not even involved. You need to use unicode gettext. How can I set the encoding of sys.stdout to another encoding? What are you going to set it to? As I understand you're going to distribute your program to some users. How are you going to find out the encoding of the terminal of your users? -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: shutil.copy Problem
On Mar 28, 7:01 am, David Nicolson [EMAIL PROTECTED] wrote: Hi John, That was an excellent idea and it was the cause problem. Whether this is a bug inshutilI'm not sure. Here is the traceback, Python 2.4.3 on Windows XP: C:\Documents and Settings\GüstavC:\python243\python Z:\sh.py Copying u'C:\\Documents and Settings\\G\xfcstav\\My Documents\\My Music\\iTunes \\iTunes Music Library.xml' ... Traceback (most recent call last): File Z:\sh.py, line 12, in ? shutil.copy(xmlfile,C:iTunes Music Library.xml) Note, there is no backslash after C:. shutil will try to make an absolute file name and concatenate it with a current directory name (C: \Documents and Settings\Güstav) that contains non-ascii characters. Because of backward compatibility the absolute name won't be unicode. On the other hand data coming from registry is unicode. When shutil tries to compare those two file names it fails. To avoid the problem you need either make both file names unicode or both file names byte- strings. However one thing is still mystery to me. Your source code contains backslash but your traceback doesn't: shutil.copy(xmlfile,C:\iTunes Music Library.xml) Theshutilline needed to be changed to this to be successful: shutil.copy(xmlfile.encode(windows-1252),C:\iTunes Music Library.xml It will work only in some European locales. Using of locale module you can make it work for 99% of world users, but it will still fail in cases like German locale and Greek characters in file names. Only using unicode everywhere in your program is a complete solution. Like shutil.copy(xmlfile, uC:\iTunes Music Library.xml) if you use constant or make sure your file name is unicode: dest = unicode() shutil.copy(xmlfile, dest) -- Leo. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unicode zipping from Python code?
On Mar 26, 12:21 am, durumdara [EMAIL PROTECTED] wrote: Hi! As I experienced in the year 2006, the Python's zip module is not unicode-safe. I'd rather say unicode file names are not supported. Why? Because zip format didn't support unicode file names upto 2006. With the hungarian filenames I got wrong result. I need to convert iso-8859-2 to cp852 chset to get good result. So you solved the problem, didn't you? As I see, this module is a command line tool imported as extension. Now I search for something that can handle the characters good, or handle the unicode filenames. You said you've got good result, so it's not clear what do you want. Does anyone knows about a python project that can do this? Or other tool what I can use for zipping intern. characters? Zipping is only half of the problem. How are you going to unzip such files? -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: shutil.copy Problem
On Mar 26, 8:10 pm, David Nicolson [EMAIL PROTECTED] wrote: Hi, I wasn't exactly sure where to send this, I don't know if it is a bug in Python or not. This is rare, but it has occurred a few times and seems to be reproducible for those who experience it. Examine this code: try: shutil.copy(/file.xml,/Volumes/External/file.xml) except Exception, err: print sys.exc_info()[0] print err This is the output: exceptions.UnicodeDecodeError 'ascii' codec can't decode byte 0xd6 in position 26: ordinal not in range(128)] What could the possible cause of this be? Show us traceback, without it I doubt anyone can help. Shouldn't shutil simply be reading and writing the bytes and not character decoding them? Yes, shutil.copy copies content verbatim. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Making a non-root daemon process
On Mar 22, 11:19 pm, Ben Finney [EMAIL PROTECTED] wrote: Howdy all, For making a Python program calve off an independent daemon process of itself, I found Carl J. Schroeder's recipe in the ASPN Python Cookbook. URL:http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/278731 This is a thorough approach, and I'm cribbing a simpler process from this example. One thing that strikes me is that the algorithm seems to depend on running the program as the root user. import os def become_daemon(): pid = os.fork() if pid == 0: # This is the child of the fork # Become a process leader of a new process group os.setsid() # Fork again and exit this parent pid = os.fork() if pid == 0: # This is the child of the second fork -- the running process. pass else: # This is the parent of the second fork # Exit to prevent zombie process os._exit(0) else: # This is the parent of the fork os._exit(0) become_daemon() # Continue with the program The double-fork seems to be to: - Allow the first forked child to start a new process group - Allow the second forked child to be orphaned immediately The problem I'm having is that 'os.setsid()' fails with 'OSError: [Errno 1] Operation not permitted' unless I run the program as the root user. This isn't a program that I want necessarily running as root. It works for me. I mean your program above produces no exceptions for me on Debian 3.1 python2.4 What does the 'os.setsid()' gain me? It dettaches you from terminal. It means you won't receive signals from terminal for sure. Like SIGINT and SIGHUP, but there are maybe other. How can I get that without being the root user? Maybe you can go over the list of all possible signals from the terminal and notify kernel that you want to ignore them. Sounds similar to dettaching from the terminal, but maybe there some differences. But the fact that os.setsid fails for you is weird anyway. -- Leo. -- http://mail.python.org/mailman/listinfo/python-list
Re: lock problem
On Mar 16, 3:08 pm, Ritesh Raj Sarraf [EMAIL PROTECTED] wrote: Leo Kislov wrote: But you miss the fact that there is only one environment per process. Maybe there's a confusion. The environment variable that I'm setting has noting to do with ldapsearch. I use the environment variable as a filename to which ldapsearch can redirect its output. And that I do is because the output can be huge and useless. Then I do some pattern matching on that file and filter my data and then delete it. If you think I still am missing something important, request you to describe it. Imagine this timeline: thread1 os.environ['__kabc_ldap'] = '/tmp/tmp1' thread1 suspended, thread2 starts to run thread2 os.environ['__kabc_ldap'] = '/tmp/tmp2' thread2 launch ldapsearch (output goes to '/tmp/tmp2') thread2 suspended, thread1 starts to run thread1 launch ldapsearch (output goes to '/tmp/tmp2' over output from ldapsearch launched from thread1) Seems like that's what is happening to your program. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: lock problem
On Mar 16, 12:40 am, Ritesh Raj Sarraf [EMAIL PROTECTED] wrote: Leo Kislov wrote: You're changing environmental variable __kabc_ldap that is shared between your threads. Environment is not designed for that kind of usage, it was designed for settings. Either use an option to set output file or just redirect stdout. If the interface of ldapsearch is so lame that it requires environmental variable use env to set the variable: env __kabc_ldap=/tmp/wrjhdsf ldapsearch ... The environment variable is set with temp_file_name which gets the name from tempfile.mkstemp(), which is run in every thread. So I don't think the environment variable is going to be the same. But you miss the fact that there is only one environment per process. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: lock problem
On Mar 15, 2:31 pm, Ritesh Raj Sarraf [EMAIL PROTECTED] wrote: [snip] os.environ['__kabc_ldap'] = temp_file_name [snip] Now as per the above code, aa is the first string which will be executed in Thread-1. In my query to the ldap server, I am getting a record which matches the aa string. I've verified it by putting a breakpoint and checking the value. The problem is that when I run the program manually, I don't get the data from the first thread i.e. of the string aa. I'm not sure if there's something wrong in the code mentioned above or is it really a lock problem. Can somebody please help about where I'm doing any mistake ? You're changing environmental variable __kabc_ldap that is shared between your threads. Environment is not designed for that kind of usage, it was designed for settings. Either use an option to set output file or just redirect stdout. If the interface of ldapsearch is so lame that it requires environmental variable use env to set the variable: env __kabc_ldap=/tmp/wrjhdsf ldapsearch ... -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: INSERT statements not INSERTING when using mysql from python
Ask Ben, he might know, although he's out to lunch. Ben wrote: I'll try it after lunch. Does anyoone know whether this might be the problem? Ben Ben wrote: I have found the problem, but not the cause. I tried setting the database up manually before hand, which let me get rid of the IF NOT EXISTS lines, and now it works! But why the *** should it not work anyway? The first time it is run, no database or tables, so it creates them. That works. But apparentlyu on subsequent runs it decides the tables it created arent' actually there, and overwrites them. Gr. Ben Ben wrote: Well, I've checked the SQL log, and my insert statements are certainly being logged. The only option left open is that the table in question is being replaced, but I can't see why it should be... Ben wrote: Nope... that can't be it. I tried running those commands manually and nothing went wrong. But then again when I execute the problematic command manually nothing goes wrong. Its just not executing until the last time, or being overwritten. Ben wrote: Each time my script is run, the following is called: self.cursor.execute(CREATE DATABASE IF NOT EXISTS +name) self.cursor.execute(USE +name) self.cursor.execute(CREATE TABLE IF NOT EXISTS table_name ( The idea being that stuf is only created the first time the script is run, and after that the original tables and database is used. This might explain my pronblem if for some reason the old tables are being replaced... can anyone see anything wrong with the above? Ben Ben wrote: One partial explanation might be that for some reason it is recreating the table each time the code runs. My code says CREATE TABLE IF NOT EXISTS but if for some reason it is creating it anyway and dropping the one before that could explain why there are missing entires. It wouldn't explain why the NOT EXISTS line is being ignored though... Ben Ben wrote: I initially had it set up so that when I connected to the database I started a transaction, then when I disconnected I commited. I then tried turning autocommit on, but that didn't seem to make any difference (althouh initially I thought it had) I'll go back and see what I can find... Cheers, Ben johnf wrote: Ben wrote: I don't know whether anyone can help, but I have an odd problem. I have a PSP (Spyce) script that makes many calls to populate a database. They all work without any problem except for one statement. I first connect to the database... self.con = MySQLdb.connect(user=username, passwd =password) self.cursor = self.con.cursor() self.cursor.execute(SET max_error_count=0) All the neccesary tables are created... self.cursor.execute(CREATE DATABASE IF NOT EXISTS +name) self.cursor.execute(USE +name) self.cursor.execute(CREATE TABLE IF NOT EXISTS networks (SM varchar(20),DMC int,DM varchar(50),NOS int,OS varchar(50),NID varchar(20)) Then I execute many insert statements in various different loops on various tables, all of which are fine, and result in multiple table entries. The following one is executed many times also. and seems identical to the rest. The print statements output to the browser window, and appear repeatedly, so the query must be being called repeatedly also: print pbSQL query executing/bp self.cursor.execute(INSERT INTO networks VALUES ('a',' +i+ ','c','2','e','f','g')) print pbSQL query executed/bp I have, for debugging, set i up as a counter variable. No errors are given, but the only entry to appear in the final database is that from the final execution of the INSERT statement (the last value of i) I suspect that this is to vague for anyone to be able to help, but if anyone has any ideas I'd be really grateful :-) It occured to me that if I could access the mysql query log that might help, but I was unsure how to enable logging for MysQL with python. Cheers, Ben Not sure this will help but where is the commit? I don't use MySQL but most SQL engines require a commit. Johnf -- http://mail.python.org/mailman/listinfo/python-list
Re: dealing with special characters in Python and MySQL
ronrsr wrote: Try putting use_unicode=True in the MySQLdb connect call. tried that, and also added charset=utf8 - now, I can't do any string operations, I get the error msg: descriptor 'lower' requires a 'str' object but received a 'unicode' args = (descriptor 'lower' requires a 'str' object but received a 'unicode',) or similar, on every string operation. What is string operation? Every time you say I get error please provide source code where this error occurs. And by the way, do you know that for non-ascii characters you should use unicode type, not str type? -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: urllib.unquote and unicode
George Sakkis wrote: The following snippet results in different outcome for (at least) the last three major releases: import urllib urllib.unquote(u'%94') # Python 2.3.4 u'%94' # Python 2.4.2 UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 0: ordinal not in range(128) # Python 2.5 u'\x94' Is the current version the right one or is this function supposed to change every other week ? IMHO, none of the results is right. Either unicode string should be rejected by raising ValueError or it should be encoded with ascii encoding and result should be the same as urllib.unquote(u'%94'.encode('ascii')) that is '\x94'. You can consider current behaviour as undefined just like if you pass a random object into some function you can get different outcome in different python versions. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: writing serial port data to the gzip file
Petr Jakes wrote: I am trying to save data it is comming from the serial port continually for some period. (expect reading from serial port is 100% not a problem) Following is an example of the code I am trying to write. It works, but it produce an empty gz file (0kB size) even I am sure I am getting data from the serial port. It looks like g.close() does not close the gz file. I was reading in the doc: Calling a GzipFile object's close() method does not close fileobj, since you might wish to append more material after the compressed data... so I am completely lost now... thanks for your comments. Petr Jakes snippet of the code def dataOnSerialPort(): data=s.readLine() if data: return data else: return 0 while 1: g=gzip.GzipFile(/root/foofile.gz,w) while dataOnSerialPort(): g.write(data) else: g.close() Your while loop is discarding result of dataOnSerialPort, so you're probably writing empty string to the file many times. Typically this kind of loop are implemented using iterators. Check if your s object (is it from external library?) already implements iterator. If it does then for data in s: g.write(data) is all you need. If it doesn't, you can use iter to create iterator for you: for data in iter(s.readLine, ''): g.write(data) -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: connect from windows to linux using ssh
[EMAIL PROTECTED] wrote: Hi Folks, How to connect from windows to linux using ssh without username/passwd. With this scenario, i need to write a program on python. Use ssh library http://cheeseshop.python.org/pypi/paramiko -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Serial port failure
Rob wrote: Hi all, I am fairly new to python, but not programming and embedded. I am having an issue which I believe is related to the hardware, triggered by the software read I am doing in pySerial. I am sending a short message to a group of embedded boxes daisy chained via the serial port. When I send a 'global' message, all the connected units should reply with their Id and Ack in this format '0 Ack' To be certain that I didn't miss a packet, and hence a unit, I do the procedure three times, sending the message and waiting for a timeout before I run through the next iteration. Frequently I get through the first two iterations without a problem, but the third hangs up and crashes, requiring me to remove the Belkin USB to serial adapter, and then reconnect it. Here is the code: import sys, os import serial import sret import time from serial.serialutil import SerialException GetAck Procedure def GetAck(p): response = try: response = p.readline() except SerialException: print Timed out return -1 res = response.split() #look for ack in the return message reslen = len(response) if reslen 5: if res[1] == 'Ack': return res[0] elif res[1] == 'Nak': return 0x7F else: return -1 Snip GetNumLanes Procedure def GetNumLanes(Lanes): print Looking for connected units # give a turn command and wait for responses msg = .g t 0 336\n for i in range(3): port = OpenPort() time.sleep(3) print port.isOpen() print Request #%d % (i+1) try: port.writelines(msg) except OSError: print Serial port failure. Power cycle units port.close() sys.exit(1) done = False # Run first connection check #Loop through getting responses until we get a -1 from GetAck while done == False: # lane will either be -1 (timeout), 0x7F (Nak), # or the lane number that responded with an Ack lane = GetAck(port) if lane = '0': Your GetAck returns either string or number and then you compare it with a string. If you compare string with a number python currently returns result you probably don't expect -1 = '0' False 0x7f = '0' False This is a wart and it will be fixed in python 3.0 (it will raise exception) I think you should rewrite GetAck to return a tuple (state, lane) def GetAck(p): response = try: response = p.readline() except SerialException: print Timed out return 'Timeout', 'NoID' res = response.split() #look for ack in the return message reslen = len(response) if reslen 5: if res[1] == 'Ack': return 'Ack', res[0] elif res[1] == 'Nak': return 'Nak', Does Nak response contain lane id? else: return 'Unknown', 'NoID' And then instead of lane = GetAck(port) if lane = '0': use state, lane = GetAck(port) if state == 'Ack': -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Serial port failure
Rob wrote: try: response = p.readline() except SerialException: print Timed out try: port.writelines(msg) except OSError: print Serial port failure. Power cycle units port.close() sys.exit(1) Does anyone have any ideas? It'd be a good idea to print all exceptions, it can help debugging the problem (if you don't like it going to the screen of an end user at least write it to a log file): except SerialException, err: print err print Timed out except OSError, err: print err print Serial port failure. Power cycle units and in your OpenPort function too. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Roundtrip SQL data especially datetime
John Nagle wrote: Routinely converting MySQL DATETIME objects to Python datetime objects isn't really appropriate, because the MySQL objects have a year range from 1000 to , while Python only has the UNIX range of 1970 to 2038. You're mistaken. Python datetime module excepts years from 1 up to : datetime.MINYEAR 1 datetime.MAXYEAR -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: how can i write a hello world in chinese with python
kernel1983 wrote: and I tried unicode and utf-8 How did you try unicode? Like this? : EasyDialogs.Message(u'\u4e2d') I tried to both use unicodeutf-8 head just like \xEF\xBB\xBF and not to use Anyone knows about the setting in the python code file? Maybe python doesn't know I'm to use chinese?! It depends on how EasyDialogs works. And by the way, when you say utf-8 encoded text is not displayed correctly, what do you actually see on the screen? -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: inconvenient unicode conversion of non-string arguments
Holger Joukl wrote: Hi there, I consider the behaviour of unicode() inconvenient wrt to conversion of non-string arguments. While you can do: unicode(17.3) u'17.3' you cannot do: unicode(17.3, 'ISO-8859-1', 'replace') Traceback (most recent call last): File stdin, line 1, in ? TypeError: coercing to Unicode: need string or buffer, float found This is somehow annoying when you want to convert a mixed-type argument list to unicode strings, e.g. for a logging system (that's where it bit me) and want to make sure that possible raw string arguments are also converted to unicode without errors (although by force). Especially as this is a performance-critical part in my application so I really do not like to wrap unicode() into some custom tounicode() function that handles such cases by distinction of argument types. Any reason why unicode() with a non-string argument should not allow the encoding and errors arguments? There is reason: encoding is a property of bytes, it is not applicable to other objects. Or some good solution to work around my problem? Do not put undecoded bytes in a mixed-type argument list. A rule of thumb working with unicode: decode as soon as possible, encode as late as possible. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: inconvenient unicode conversion of non-string arguments
Holger Joukl wrote: [EMAIL PROTECTED] schrieb am 13.12.2006 11:02:30: Holger Joukl wrote: Hi there, I consider the behaviour of unicode() inconvenient wrt to conversion of non-string arguments. While you can do: unicode(17.3) u'17.3' you cannot do: unicode(17.3, 'ISO-8859-1', 'replace') Traceback (most recent call last): File stdin, line 1, in ? TypeError: coercing to Unicode: need string or buffer, float found [...] Any reason why unicode() with a non-string argument should not allow the encoding and errors arguments? There is reason: encoding is a property of bytes, it is not applicable to other objects. Ok, but I still don't see why these arguments shouldn't simply be silently ignored for non-string arguments. That's rather bizzare and sloppy approach. Should unicode(17.3, 'just-having-fun', 'I-do-not-like-errors') unicode(17.3, 'sdlfkj', 'ewrlkj', 'eoirj', 'sdflkj') work? Or some good solution to work around my problem? Do not put undecoded bytes in a mixed-type argument list. A rule of thumb working with unicode: decode as soon as possible, encode as late as possible. It's not always that easy when you deal with a tree data structure with the tree elements containing different data types and your user may decide to output root.element.subelement.whateverData. I have the problems in a logging mechanism, and it would vanish if unicode(non-string, encoding, errors) would work and just ignore the obsolete arguments. I don't really see from your example what stops you from putting unicode instead of bytes into your tree, but I can believe some libraries can cause some extra work. That's the problem with libraries, not with builtin function unicode(). Would you be happy if floating point value 17.3 would be stored as 8 bytes in your tree? After all, that is how 17.3 is actually represented in computer memory. Same story with unicode, if some library gives you raw bytes *you* have to do extra work later. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: How to turn of the monitor by python?
[EMAIL PROTECTED] wrote: I want to turn off my monitor from within python, How to do it? Thanks! Do you realize that hardware management and control is OS dependant? When asking such questions always specify OS. Assuming you are interested in Windows, then you just need to translate this http://www.codeproject.com/system/display_states.asp C API calls into python. You can use ctypes (included in Python 2.5) or python win32 extensions. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: sys.stdin.encoding
[EMAIL PROTECTED] wrote: Duncan Booth skrev: [EMAIL PROTECTED] wrote: The following line in my code is failing because sys.stdin.encoding is Null. I'll guess you mean None rather than Null. This has only started happening since I started working with Pydef in Eclipse SDK. Any ideas? uni=unicode(word,sys.stdin.encoding) You could give it a fallback value: uni = unicode(word, sys.stdin.encoding or sys.getdefaultencoding()) or even just: uni = unicode(word, sys.stdin.encoding or 'ascii') which should be the same in all reasonable universes (although I did get bitten recently when someone had changed the default encoding in a system). Thanks for your help. The problem now is that I cant enter the Swedish characters åöä etc without getting the following error - Enter word Påe Traceback (most recent call last): File C:\Documents and Settings\workspace\simple\src\main.py, line 25, in module archive.Test() File C:\Documents and Settings\workspace\simple\src\verb.py, line 192, in Test uni=unicode(word,sys.stdin.encoding or sys.getdefaultencoding()) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 1: ordinal not in range(128) The call to sys.getdefaultencoding() returns ascii. Since I can enter the characters åöä on the command line in Pydef/Eclipse doesn't that mean that the stdin is not ascii? What should I do? The workaround in your case is: in the beginning of your program: import sys if hasattr(sys.stdin, 'encoding'): console_encoding = sys.stdin.encoding else: import locale locale_name, console_encoding = locale.getdefaultlocale() and later: uni = unicode(word, console_encoding) But don't think it's portable, if you use other IDE or OS, it may not work. It would be better if PyDev implemented sys.stdin.encoding -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: sys.stdin.encoding
Martin v. Löwis wrote: [EMAIL PROTECTED] schrieb: The following line in my code is failing because sys.stdin.encoding is Null. This has only started happening since I started working with Pydef in Eclipse SDK. Any ideas? uni=unicode(word,sys.stdin.encoding) That's a problem with pydev, where the standard machinery to determine the terminal's encoding fail. I have no idea yet how to fix this. Environmental variable TERMENCODING ? Heck, maybe this will catch on and will be used by other languages, libraries, terminals, etc. It's not really Python only problem. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Printing Barcodes from webapp?
Burhan wrote: Hello Group: I am in the planning stages of an application that will be accessed over the web, and one of the ideas is to print a barcode that is generated when the user creates a record. The application is to track paperwork/items and uses barcodes to easily identify which paper/item belongs to which record. Is there an easy way to generate barcodes using Python -- considering the application will be printing to a printer at the client's machine? I thought of two ways this could be done; one would be to interface with the printing options of the browser to ensure that margins, headers, footers are setup properly (I have done this before using activex and IE, but with mixed results); the other would be to install some small application at the client machine that would intercept the print jobs and format them properly (taking the printing function away from the browser). Does anyone have any experience or advice? Any links I could read up on to help me find out how to program this? Another way (easier hopefully) to accomplish this? I think one of the easiest ways is to install acrobat reader and redirect client browser to a generated pdf file. http://www.reportlab.org/ has support for generating barcodes (and more) in pdf documents. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with imaplib (weird result if mailbox contains a %)
Antoon Pardon wrote: On 2006-11-28, Leo Kislov [EMAIL PROTECTED] wrote: Antoon Pardon wrote: This little program gives IMO a strange result. import imaplib user = cpapen cyr = imaplib.IMAP4(imap.vub.ac.be) cyr.login(cyrus, cOn-A1r) rc, lst = cyr.list('', user/%s/* % user) for el in lst: print %r % (el,) And the result is: '(\\HasNoChildren) / user/cpapen/Out' '(\\HasNoChildren) / user/cpapen/Punten' '(\\HasNoChildren) / user/cpapen/Spam' '(\\HasNoChildren) / user/cpapen/agoog to be' '(\\HasNoChildren) / user/cpapen/artistiek - kunst' '(\\HasNoChildren) / user/cpapen/copains et copinnes =x=' '(\\HasNoChildren) / user/cpapen/cp - writing' '(\\HasNoChildren) / user/cpapen/examen' '(\\HasNoChildren) / user/cpapen/important info (pass)' '(\\HasNoChildren) / user/cpapen/lesmateriaal' '(\\HasNoChildren) / user/cpapen/love - flesh for fantasy' '(\\HasNoChildren) / user/cpapen/media' '(\\HasNoChildren) / user/cpapen/music - beats' ('(\\HasNoChildren) / {25}', 'user/cpapen/newsletters %') '' '(\\HasNoChildren) / user/cpapen/organisatie - structuur' '(\\HasNoChildren) / user/cpapen/sociale wetenschappen' '(\\HasNoChildren) / user/cpapen/the closest ones to me [x]' '(\\HasNoChildren) / user/cpapen/vubrations' '(\\HasNoChildren) / user/cpapen/wm2addressbook' '(\\HasNoChildren) / user/cpapen/wm2prefs' '(\\HasNoChildren) / user/cpapen/wm2signature' What I have a problem with is the 14th and 15th line. All other entries are strings but the 14th is a tuple. and the 15th is an empty string. As far as I can tell every time a % is in the mailbox name I get this kind of result. I'm using python 2.3.3 and the imap sytem is Cyrus. Can someone explain what is going one? Is this a bug? Empty string seems to be a bug. But tuple is by design, read the docs and imap rfc. The protocol is convoluted in the first place, and so is python interface. Are there more docs than at http://www.python.org/doc/. I don't find those very helpfull in explaining this. I also took a look at rfc 2060 and to be honest I don't find anything there to explain this difference. I only took a closer look at section 7.2.2. So maybe I should look somewehere else but after reading section 7.2.2. I don't understand why the list method returned a tuple for this mailbox instead of the following string: '(\\HasNoChildren) / user/cpapen/newsletters %' This is described in section 4.3. imaplib is too close to the protocol. It should interpret response for each command separately. For example list method could return list of tuples like: (\\HasNoChildren, /, user/cpapen/newsletters %) Without this abstraction level in imaplib you have to build it yourself. If it is, is it fixed in later versions? Why don't you try to pull imaplib.py from later versions? I don't think it changed that much so it should be compatible with python 2.3 I could take my hands on a 2.4 version and the result was the same. I was talking only about empty string response. Is it still there? Anyway, this issue requires investigation. That could also be a bug in the server. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: How to increase the speed of this program?
Peter Otten wrote: Peter Otten wrote: HYRY wrote: I want to join two mono wave file to a stereo wave file by only using the default python module. Here is my program, but it is much slower than the C version, so how can I increase the speed? I think the problem is at line #1, #2, #3. oarray = array.array(h, [0]*(len(larray)+len(rarray))) #1 ITEMSIZE = 2 size = ITEMSIZE*(len(larray) + len(rarray)) oarray = array.array(h) oarray.fromstring(\0 * size) may be a bit faster. Confirmed: $ python2.5 -m timeit -s'from array import array; N = 10**6' 'a = array(h); a.fromstring(\0*(2*N))' 100 loops, best of 3: 9.68 msec per loop $ python2.5 -m timeit -s'from array import array; N = 10**6' 'a = array(h, [0]*N);' 10 loops, best of 3: 199 msec per loop Funny thing is that using huge temporary string is faster that multiplying small array: C:\Python25python -m timeit -sfrom array import array; N = 10**6 a =array('h'); a.fromstring('\0'*(2*N)) 100 loops, best of 3: 9.57 msec per loop C:\Python25python -m timeit -sfrom array import array; N = 10**6 a = array('h','\0\0'); a*N 10 loops, best of 3: 28.4 msec per loop Perhaps if array multiplication was as smart as string multiplication then array multiplication version would be the fastest. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: How to increase the speed of this program?
HYRY wrote: Peter Otten wrote: HYRY wrote: I want to join two mono wave file to a stereo wave file by only using the default python module. Here is my program, but it is much slower than the C version, so how can I increase the speed? I think the problem is at line #1, #2, #3. oarray = array.array(h, [0]*(len(larray)+len(rarray))) #1 ITEMSIZE = 2 size = ITEMSIZE*(len(larray) + len(rarray)) oarray = array.array(h) oarray.fromstring(\0 * size) may be a bit faster. Peter Thank you very much, that is just what I want. Even faster: oarray = larray + rarray C:\Python25python -m timeit -sfrom array import array; N = 10**6 a =array('h'); a.fromstring('\0'*(2*N)) 100 loops, best of 3: 9.57 msec per loop C:\Python25python -m timeit -sfrom array import array; N = 10**6; b = array('h', [0])*(N/2); c = b[:] a = b + c 100 loops, best of 3: 5.7 msec per loop -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Modifying every alternate element of a sequence
[EMAIL PROTECTED] wrote: I have a list of numbers and I want to build another list with every second element multiplied by -1. input = [1,2,3,4,5,6] wanted = [1,-2,3,-4,5,-6] I can implement it like this: input = range(3,12) wanted = [] for (i,v) in enumerate(input): if i%2 == 0: wanted.append(v) else: wanted.append(-v) But is there any other better way to do this. Use slices: input[1::2] = [-item for item in input[1::2]] If you don't want to do it in-place, just make a copy: wanted = input[:] wanted[1::2] = [-item for item in wanted[1::2]] -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: os.walk return hex excapes
Alex S wrote: Hi, os.walk return hex excape sequence inside a files name, and when i try to feed it back to os.remove i get OSError: [Errno 22] Invalid argument: 'C:\\Temp\\?p?\xbfS\xbf\xac?G\xaba ACDSee \xbb?a??n a???\xac\xb5\xbfn.exe' It's not escape sequences that are the problem but question marks, I suspect. Most likely this file name contains characters not in your locale's language. To access this file name you need to use unicode, just make sure the first parameter of os.walk is a unicode string, for example: os.walk(u'c:\\temp'). The exact code how to make the first parameter unicode depends on where it is coming from (network, config file, registry, etc...) Reading unicode tutorial is highly recommended. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Modifying every alternate element of a sequence
[EMAIL PROTECTED] wrote: Wow, I was in fact searching for this syntax in the python tutorial. It is missing there. Is there a reference page which documents all possible list comprehensions. There is actually only two forms of list comprehensions: http://docs.python.org/ref/lists.html [blah for x in expr] and [blah for x in expr if cond] And here is reference page for slicing (note, it's not list comprehension): http://docs.python.org/ref/slicings.html -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Dynamic/runtime code introspection/compilation
Thomas W wrote: Maybe a stupid subject, but this is what I want to do : I got some python code stored in a string: somecode = from somemodule import ISomeInterface class Foo(ISomeInterface): param1 = ... param2 = and I want to compile that code so that I can use the Foo-class and check what class it extends, in this case ISomeInterface etc. I've tried eval, codeop etc. but it doesn't work. Something like this would be nice : from somemodule import ISomeInteface d = compile(sourcecode) myfoo = d.Foo() print ISomeInterface in myfoo.__bases__ Any hints? Here is hello world program for plugins: import sys somecode = class Foo: param1 = Hello, world! plugin = type(sys)('unknown_plugin') # Create new empty module exec somecode in plugin.__dict__ print plugin.Foo.param1 -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with imaplib (weird result if mailbox contains a %)
Antoon Pardon wrote: This little program gives IMO a strange result. import imaplib user = cpapen cyr = imaplib.IMAP4(imap.vub.ac.be) cyr.login(cyrus, cOn-A1r) rc, lst = cyr.list('', user/%s/* % user) for el in lst: print %r % (el,) And the result is: '(\\HasNoChildren) / user/cpapen/Out' '(\\HasNoChildren) / user/cpapen/Punten' '(\\HasNoChildren) / user/cpapen/Spam' '(\\HasNoChildren) / user/cpapen/agoog to be' '(\\HasNoChildren) / user/cpapen/artistiek - kunst' '(\\HasNoChildren) / user/cpapen/copains et copinnes =x=' '(\\HasNoChildren) / user/cpapen/cp - writing' '(\\HasNoChildren) / user/cpapen/examen' '(\\HasNoChildren) / user/cpapen/important info (pass)' '(\\HasNoChildren) / user/cpapen/lesmateriaal' '(\\HasNoChildren) / user/cpapen/love - flesh for fantasy' '(\\HasNoChildren) / user/cpapen/media' '(\\HasNoChildren) / user/cpapen/music - beats' ('(\\HasNoChildren) / {25}', 'user/cpapen/newsletters %') '' '(\\HasNoChildren) / user/cpapen/organisatie - structuur' '(\\HasNoChildren) / user/cpapen/sociale wetenschappen' '(\\HasNoChildren) / user/cpapen/the closest ones to me [x]' '(\\HasNoChildren) / user/cpapen/vubrations' '(\\HasNoChildren) / user/cpapen/wm2addressbook' '(\\HasNoChildren) / user/cpapen/wm2prefs' '(\\HasNoChildren) / user/cpapen/wm2signature' What I have a problem with is the 14th and 15th line. All other entries are strings but the 14th is a tuple. and the 15th is an empty string. As far as I can tell every time a % is in the mailbox name I get this kind of result. I'm using python 2.3.3 and the imap sytem is Cyrus. Can someone explain what is going one? Is this a bug? Empty string seems to be a bug. But tuple is by design, read the docs and imap rfc. The protocol is convoluted in the first place, and so is python interface. If it is, is it fixed in later versions? Why don't you try to pull imaplib.py from later versions? I don't think it changed that much so it should be compatible with python 2.3 Whether or not it is a bug, can I rely on the mailbox being the last item in the tuple in these cases? Yes (at least for list command) -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Email headers and non-ASCII characters
Christoph Haas wrote: Hello, everyone... I'm trying to send an email to people with non-ASCII characters in their names. A recpient's address may look like: Jörg Nørgens [EMAIL PROTECTED] My example code: = def sendmail(sender, recipient, body, subject): message = MIMEText(body) message['Subject'] = Header(subject, 'iso-8859-1') message['From'] = Header(sender, 'iso-8859-1') message['To'] = Header(recipient, 'iso-8859-1') s = smtplib.SMTP() s.connect() s.sendmail(sender, recipient, message.as_string()) s.close() = However the Header() method encodes the whole expression in ISO-8859-1: =?iso-8859-1?q?=22J=C3=B6rg_N=C3=B8rgens=22_=3Cjoerg=40nowhere=3E?= However I had expected something like: =?utf-8?q?J=C3=B6rg?= =?utf-8?q?_N=C3=B8rgens?= [EMAIL PROTECTED] Of course my mail transfer agent is not happy with the first string although I see that Header() is just doing its job. I'm looking for a way though to encode just the non-ASCII parts like any mail client does. Does anyone have a recipe on how to do that? Or is there a method in the email module of the standard library that does what I need? Or should I split by regular expression to extract the email address beforehand? Or a list comprehension to just look for non-ASCII character and Header() them? Sounds dirty. Why dirty? from email.Header import Header from itertools import groupby h = Header() addr = u'Jörg Nørgens [EMAIL PROTECTED]' def is_ascii(char): return ord(char) 128 for ascii, group in groupby(addr, is_ascii): h.append(''.join(group),latin-1) print h = J =?iso-8859-1?q?=F6?= rg N =?iso-8859-1?q?=F8?= rgens [EMAIL PROTECTED] -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: A python IDE for teaching that supports cyrillic i/o
Kirill Simonov wrote: Hi, Could anyone suggest me a simple IDE suitable for teaching Python as a first programming language to high school students? It is necessary that it has a good support for input/output in Cyrillic. Unfortunately, most IDEs I tried failed miserably in this respect. My test was simple: I've run the code name = raw_input(What's your name? ) # written in Russian print Hello, %s! % name # in Russian as well both from the shell and as a standalone script. This either caused a UnicodeError or just printed invalid characters. For the record, I've checked IDLE, PythonWin, Eric, DrPython, SPE, and WingIDE. The only ones that worked are WingIDE and IDLE (under Linux, but not under Windows). IDLE on Windows works fine for your example in interactive console: name = raw_input(What's your name? ) What's your name? Леонид print name Леонид name u'\u041b\u0435\u043e\u043d\u0438\u0434' and as a script: What's your name? Леонид Hello, Леонид! type 'unicode' That is IDLE + python 2.4 on Windows. So I'm not sure what is the problem. In other messages you seems to be talking about system console. Why? It's not part of IDE. And another question: are you aware of the fact that recommended way to handle non-ascii characters is to use unicode type? Most of IDEs should work fine with unicode. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: A python IDE for teaching that supports cyrillic i/o
Kirill Simonov wrote: On Sun, Nov 19, 2006 at 03:27:32AM -0800, Leo Kislov wrote: IDLE on Windows works fine for your example in interactive console: name = raw_input(What's your name? ) Have you tried to use cyrillic characters in a Python string in interactive console? When I do it, I get the Unsupported characters in input error. For instance, print Привет # That's Hi in Russian. Unsupported characters in input That works for me in Win XP English, with Russian locale and Russian language for non-unicode programs. Didn't you say you want to avoid unicode? If so, you need to set proper locale and language for non-unicode programs. And another question: are you aware of the fact that recommended way to handle non-ascii characters is to use unicode type? Most of IDEs should work fine with unicode. Usually using unicode type gives you much more headache than benefits unless you are careful enough to never mix unicode and str objects. For a professional programmer life is full of headaches like this :) For high school students it could be troublesome and annoying, I agree. Anyway, I just want the interactive console of an IDE to behave like a real Python console under a UTF-8 terminal (with sys.stdout.encoding == 'utf-8'). Do you realize that utf-8 locale makes len() function and slicing of byte strings look strange for high school students? hi = uПривет.encode(utf-8) r = uр.encode(utf-8) print len(hi)# prints 12 print hi[1] == r # prints False for char in hi: print char # prints garbage As I see you have several options: 1. Set Russian locale and Russian language for non-unicode programs on Windows. 2. Introduce students to unicode. 3. Wait for python 3.0 4. Hack some IDE to make unicode friendly environment like unicode literals by default, type(Привет) == unicode, unicode stdin/stdout, open() uses utf-8 encoding by default for text files, etc... -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: os.lisdir, gets unicode, returns unicode... USUALLY?!?!?
Martin v. Löwis wrote: Leo Kislov schrieb: How about returning two lists, first list contains unicode names, the second list contains undecodable names: files, troublesome = os.listdir(separate_errors=True) and make separate_errors=True by default in python 3.0 ? That would be quite an incompatible change, no? Yeah, that was idea-dump. Actually it is possible to make this idea mostly backward compatible by making os.listdir() return only unicode names and os.binlistdir() return only binary directory entries. Unfortunately the same trick will not work for getcwd. Another idea is to map all 256 bytes to unicode private code points. When a file name cannot be fully decoded the undecoded bytes will be mapped to specially allocated code points. Unfortunately this idea seems to leak if the program later wants to write such unicode string to a file. Python will have to throw an exception since we don't know if it is ok to write broken string to a file. So we are back to square one, programs need to deal with filesystem garbage :( -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: how to print pdf with python on a inkjet printer.
krishnakant Mane wrote: hello all. I am developing an ncurses based python application that will require to create pdf reports for printing. I am not using py--qt or wx python. it is a consol based ui application and I need to make a pdf report and also send it to a lazer or ink jet printer. is it possible to do so with python? or is it that I will have to use the wxpython library asuming that there is a print dialog which can open up the list of printers? if wx python and gui is the only way then it is ok but I will like to keep this application on the ncurses side. Assuming you are on a UNIX-like system, you really need to setup CUPS http://www.cups.org/ (or may be your system already provides CUPS). PDF seems to be the future intermediate format for UNIX printing http://www.linux.com/article.pl?sid=06/04/18/2114252 and CUPS already supports printing PDF files, just run lp your_file.pdf to print a file. CUPS only have command line interface: http://www.cups.org/documentation.php/options.html -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: how to print pdf with python on a inkjet printer.
Leo Kislov wrote: CUPS only have command line interface: http://www.cups.org/documentation.php/options.html My mistake: CUPS actually has official C API http://www.cups.org/documentation.php/api-cups.html and unofficial python bindings http://freshmeat.net/projects/pycups/. -- http://mail.python.org/mailman/listinfo/python-list
Re: os.lisdir, gets unicode, returns unicode... USUALLY?!?!?
Martin v. Löwis wrote: gabor schrieb: All this code will typically work just fine with the current behavior, so people typically don't see any problem. i am sorry, but it will not work. actually this is exactly what i did, and it did not work. it dies in the os.path.join call, where file_name is converted into unicode. and python uses 'ascii' as the charset in such cases. but, because listdir already failed to decode the file_name with the filesystem-encoding, it usually also fails when tried with 'ascii'. Ah, right. So yes, it will typically fail immediately - just as you wanted it to do, anyway; the advantage with this failure is that you can also find out what specific file name is causing the problem (whereas when listdir failed completely, you could not easily find out the cause of the failure). How would you propose listdir should behave? How about returning two lists, first list contains unicode names, the second list contains undecodable names: files, troublesome = os.listdir(separate_errors=True) and make separate_errors=True by default in python 3.0 ? -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: os.lisdir, gets unicode, returns unicode... USUALLY?!?!?
gabor wrote: Martin v. Löwis wrote: gabor schrieb: i also recommend this approach. also, raising an exception goes well with the principle of the least surprise imho. Are you saying you wouldn't have been surprised if that had been the behavior? yes, i would not have been surprised. because it's kind-of expected when dealing with input, that malformed input raises an unicode-exception. and i would also expect, that if os.listdir completed without raising an exception, then the returned data is correct. The problem is that most programmers just don't want to deal with filesystem garbage but they won't be happy if the program breaks either. How would you deal with that exception in your code? depends on the application. in the one where it happened i would just display an error message, and tell the admins to check the filesystem-encoding. (in other ones, where it's not critical to get the correct name, i would probably just convert the text to unicode using the replace behavior) what about using flags similar to how unicode() works? strict, ignore, replace and maybe keep-as-bytestring. like: os.listdir(dirname,'strict') That's actually an interesting idea. The error handling modes could be: 'mix' -- current behaviour, 'ignore' -- drop names that cannot be decoded, 'separate' -- see my other message. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem reading with bz2.BZ2File(). Bug?
Clodoaldo Pinto Neto wrote: Fredrik Lundh wrote: Clodoaldo Pinto Neto wrote: The offending file is 5.5 MB. Sorry, i could not reproduce this problem with a smaller file. but surely you can post the repr() of the last two lines? This is the output: $ python bzp.py line number: 588317 '\x07' '' Confirmed on windows with 2.4 and 2.5: C:\p\Python24\python.exe bzp.py line number: 588317 '\x1e' '' C:\p\Python25\python.exe bzp.py line number: 588317 '\x1e' '' Looks like one byte of garbage is appended at the end of file. Please file a bug report. As a workaround rU mode seems to work fine for this file. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: str.title question after '
Antoon Pardon wrote: I have a text in ascii. I use the ' for an apostroph. The problem is this gives problems with the title method. I don't want letters after a ' to be uppercased. Here are some examples: argument result expected 't smidje 'T Smidje 't Smidje na'ama Na'Ama Na'ama al pi tnu'atAl Pi Tnu'AtAl Pi Tnu'at Is there an easy way to get what I want? def title_words(s): words = re.split('(\s+)', s) return ''.join(word[0:1].upper()+word[1:] for word in words) Should the current behaviour condidered a bug? I believe it follows definition of \w from re module. My would be inclined to answer yes, but that may be because this behaviour would be wrong in Dutch. I'm not so sure about english. The problem is more complicated. First of all, why title() should be limited to human languages? What about programming languages? Is bar.bar.spam three tokens or one in a foo programming language? There are some problems with human languages too: how are you going to process out-of-the-box and italian-american? -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Character Encodings and display of strings
JKPeck wrote: It seemed to me that this sentence For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(). might mean that the encoding setting of the source file might influence how repr represented the contents of the string. Nothing to do with Unicode. If a source file could have a declared encoding of, say, cp932 via the # coding comment, I thought there was a chance that eval would respond to that, too. Not a chance :) Encoding is a property of an input/output object (console, web page, plain text file, MS Word file, etc...). All input/output object have specific rules determining their encoding, there is absolutely no connection between encoding of the source file and any other input/output object. repr escapes bytes 128..255 because it doesn't know where you're going to output its result so repr uses the safest encoding: ascii. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: comparing Unicode and string
Neil Cerutti wrote: On 2006-11-10, Steve Holden [EMAIL PROTECTED] wrote: But I don't insist on my PEP. The example just shows just another pitfall with Unicode and why I'll advise to any beginner: Never write text constants that contain non-ascii chars as simple strings, always make them Unicode strings by prepending the u. That doesn't do any good if you aren't writing them in unicode code points, though. You tell the interpreter what encoding your source code is in. It then knows precisely how to decode your string literals into Unicode. How do you write things in Unicode code points? for = uf\xfcr Unless you're using unicode unfriendly editor or console, uf\xfcr is the same as ufür: uf\xfcr is ufür True So there is no need to write unicode strings in hexadecimal representation of code points. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Erronous unsupported locale setting ?
robert wrote: Why can the default locale not be set by its true name? but only by '' ? : Probably it is just not implemented. But since locale names are system specific (For example windows accepts 'ch' as Chinese in Taiwan, where as IANA http://www.iana.org/assignments/language-subtag-registry considers it Chamorro) setlocale should probably grow an additional keyword parameter: setlocale(LC_ALL, iana='de-DE') -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Erronous unsupported locale setting ?
robert wrote: Leo Kislov wrote: robert wrote: Why can the default locale not be set by its true name? but only by '' ? : Probably it is just not implemented. But since locale names are system specific (For example windows accepts 'ch' as Chinese in Taiwan, where as IANA http://www.iana.org/assignments/language-subtag-registry considers it Chamorro) setlocale should probably grow an additional keyword parameter: setlocale(LC_ALL, iana='de-DE') that'd be another fat database to blow up the python core(s). I just wonder why locale.setlocale(locale.LC_ALL,de_DE) doesn't accept the name, which locale.getlocale() / getdefaultlocale() ('de_DE', 'cp1252') already deliver ? It is documented that those functions return cross platform RFC 1766 language code. This code sometimes won't be compatible with OS specific locale name. Cross platform code can useful if you want to create your own locale database for example cross platform language packs. Right now we have: setlocale(category) -- get(it's not a typo) OS locale name getlocale(category) -- get cross platform locale name setlocale(category,'') -- enable default locale, return OS locale name getdefaultlocale() -- get cross platform locale name I agree it's very confusing API, especially setlocale acting like getter, but that's what we have. Improvement is welcome. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: How to test python extension modules during 'make check' / 'make distcheck'?
Mark Asbach wrote: Hi pythonians, I'm one of the maintainers of an open source image processing toolkit (OpenCV) and responsible for parts of the autotools setup. The package mainly consists of four shared libraries but is accompanied by a python package containing some pure python code and of course extension modules for the four libraries. Now during the last month we were preparing a major release which means testing, fixing, testing, fixing, ... in the first degree. Typical functionality of the shared libraries is verified during 'make check' and 'make distcheck' by binaries that are linked against the libraries (straight forward) and are listed in the 'TESTS' automake primary. Unfortunately, many problems with the python wrappers arose from time to time. Currently we have to build and install before we can run any python-based test routines. When trying to integrate python module testing into the automake setup, there are some problems that I couldn't find a solution for: a) the extension modules are built in different (other) subdirectories - so they are not in the local path where python could find them As I understand it's not python that cannot find them but dynamic linker. On ELF UNIX systems you can set LD_LIBRARY_PATH to help linker find dependencies, on Windows -- PATH. If you need details, you can find them in dynamic linker manuals. b) the libraries and extension modules are built with libtool and may have rpaths compiled in (is this problematic)? libtools seems to have some knobs to cope with rpath: http://sourceware.org/ml/bug-glibc/2000-01/msg00058.html c) a different version of our wrappers might be installed on the testing machine, somewhere in python/site-packages. How can I make sure that python only finds my 'new' local generated modules? Set PYTHONPATH to the directory where locally generated modules are located. They will be found before site packages. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Lookuperror : unknown encoding : utf-8
Sachin Punjabi wrote: On Oct 30, 1:29 pm, Fredrik Lundh [EMAIL PROTECTED] wrote: Sachin Punjabi wrote: The OS is Windows XPthen your installation is seriously broken. where did you get the installation kit? have you removed stuff from the Lib directory ? /F It was already installed on my PC and I have no clue how it was installed or any changes has been done. Then it's a distribution of your PC manufacturer. They could omit some modules like utf-8 codec. I am just downloading newer version from python.org and will install and check it. I think there should be problem with installation itself. That's a right idea, I'd also recommend to leave the manufacturer's python distribution alone. Do not remove it, do not upgrade it. Some programs provided by the manufacturer can stop working. If the preinstalled python was installed into c:\python24 directory, choose some other directory when you install python from python.org. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Lookuperror : unknown encoding : utf-8
Sachin Punjabi wrote: I installed it again but it makes no difference. It still throws me error for LookUp Error: unknown encoding : utf-8. Most likely you're not using the new python, you're still running old one. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: subprocess decoding?
MC wrote: Hi! On win-XP (french), when I read subprocess (stdout), I must use differents decoding (cp1252,cp850,cp437, or no decoding), depending of the launch mode of the same Python's script: - from command-line - from start+run - from icon - by Python-COM-server - etc. (.py .pyw can also contribute) How to know, on the fly, the encoding used by subprocess? You can't. Consider a Windows equivalent of UNIX cat program. It just dump content of a file to stdout. So the problem of finding out the encoding of stdout is equal to finding out encoding of any file. It's just impossible to do in general. Now, you maybe talking about conventions. AFAIK since Windows doesn't have strong command line culture, it doesn't such conventions. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: Lookuperror : unknown encoding : utf-8
Sachin Punjabi wrote: Hi, I wanted to read a file encoded in utf-8 and and using the following syntax in my source which throws me an error specifying Lookuperror : unknown encoding : utf-8. Also I am working on Python version 2.4.1. import codecs fileObj = codecs.open( data.txt, r, utf-8 ) Can anyone please guide me how do I get utf-8 activated in my codecs or any setting needs to be done for the same before using codecs. What OS? Where did you get your python distribution? Anyway, I believe utf-8 codec was in the python.org distribution since the introduction of unicode (around python 2.0). If you can't use utf-8 codec right out of the box, something is really wrong with your setup. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: gettext on Windows
[EMAIL PROTECTED] wrote: Martin v. Löwis wrote: [EMAIL PROTECTED] schrieb: Traceback (most recent call last): File panicbutton.py, line 36, in ? lan = gettext.GNUTranslations (open (sLang, rb)) File C:\Python24\lib\gettext.py, line 177, in __init__ self._parse(fp) File C:\Python24\lib\gettext.py, line 280, in _parse raise IOError(0, 'File is corrupt', filename) IOError: [Errno 0] File is corrupt: 'locale\\fr_FR.mo' If it says so, it likely is right. How did you create the file? I only get the File is corrupt error when I changed lan = gettext.GNUTranslations (open (sLang)) This code definately corrupts .mo files since on windows files are opened in text mode by default. to lan = gettext.GNUTranslations (open (sLang, rb)) Without the rb in the open () I get a struct.error : unpack str size does not match format error (see original post). struct.error usually means input data doesn't correspond to expected format. The .mo files were created using poEdit (www.poedit.org), and I get the same error with various translations, all created by different people. Try msgunfmt http://www.gnu.org/software/gettext/manual/html_node/gettext_128.html#SEC128 to see if it can convert your files back to text. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: my first software
[EMAIL PROTECTED] wrote: I am a beginner of programming and started to learn Python a week ago. last 3 days, i write this little tool for Renju.if you have any advice on my code,please tell me s = '' for i in range (0,len(done) - 1): s = s +str(done[i][0]) + str(done[i][1]) + '\n' s = s + str(done[len(done) - 1][0]) + str(done[len(done) - 1][1]) This is easier to do with a generator comprehension and join method: s = '\n'.join(str(item[0]) + str(item[1]) for item in done) for i in range (0, len(s)): x = s[i][0] . if i%2 == 0: There is a builtin function enumerate for this case, IMHO it's slightly easier to read: for i, item in enumerate(s) x = item[0] ... if not i%2: ... if len(done) != 0 and beensaved == 0 and askyesno(...): saveasfile() It's a personal matter, but usually python programmers treats values in boolean context directly without comparison: if done and not beensaved and askyesno(...): The rules are documented here: http://docs.python.org/lib/truth.html . -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: subprocess cwd keyword.
Ivan Vinogradov wrote: Dear All, I would greatly appreciate a nudge in the right direction concerning the use of cwd argument in the call function from subprocess module. The setup is as follows: driver.py - python script core/ - directory main- fortran executable in the core directory driver script generates some input files in the core directory. Main should do its thing and dump the output files back into core. The problem is, I can't figure out how to do this properly. call(core/main) works but uses .. of core for input/output. call(core/main,cwd=core) and call(main,cwd=core) both result in [snip exception] Usually current directory is not in the PATH on UNIX. Try call(./main,cwd=core) -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: How to identify generator/iterator objects?
Kenneth McDonald wrote: I'm trying to write a 'flatten' generator which, when give a generator/iterator that can yield iterators, generators, and other data types, will 'flatten' everything so that it in turns yields stuff by simply yielding the instances of other types, and recursively yields the stuff yielded by the gen/iter objects. To do this, I need to determine (as fair as I can see), what are generator and iterator objects. Unfortunately: iter(abc) iterator object at 0x61d90 def f(x): ... for s in x: yield s ... f function f at 0x58230 f.__class__ type 'function' So while I can identify iterators, I can't identify generators by class. But f is not a generator, it's a function returning generator: def f(): ... print Hello ... yield 1 ... iter(f) Traceback (most recent call last): File input, line 1, in ? TypeError: iteration over non-sequence iter(f()) generator object at 0x016C7238 type(f()) type 'generator' Notice, there is no side effect of calling f function. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: How to identify generator/iterator objects?
Michael Spencer wrote: Kenneth McDonald wrote: I'm trying to write a 'flatten' generator which, when give a generator/iterator that can yield iterators, generators, and other data types, will 'flatten' everything so that it in turns yields stuff by simply yielding the instances of other types, and recursively yields the stuff yielded by the gen/iter objects. To do this, I need to determine (as fair as I can see), what are generator and iterator objects. Unfortunately: iter(abc) iterator object at 0x61d90 def f(x): ... for s in x: yield s ... f function f at 0x58230 f.__class__ type 'function' So while I can identify iterators, I can't identify generators by class. Is there a way to do this? Or perhaps another (better) way to achieve this flattening effect? itertools doesn't seem to have anything that will do it. Thanks, Ken I *think* the only way to tell if a function is a generator without calling it is to inspect the compilation flags of its code object: from compiler.consts import CO_GENERATOR def is_generator(f): ... return f.func_code.co_flags CO_GENERATOR != 0 ... def f1(): yield 1 ... def f2(): return 1 ... is_generator(f1) True is_generator(f2) False It should be noted that this checking is completely irrelevant for the purpose of writing flatten generator. Given def inc(n): yield n+1 the following conditions should be true: list(flatten([inc,inc])) == [inc,inc] list(flatten([inc(3),inc(4)]) == [4,5] -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: encoding of sys.argv ?
Jiba wrote: Hi all, I am desperately searching for the encoding of sys.argv. I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaultencoding() is ascii and sys.getfilesystemencoding() is utf-8. However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in latin-1, but why ? Your system is misconfigured, complain to your distribution. On UNIX sys.getfilesystemencoding(), sys.stdin.encoding, sys.stdout.encoding, locale.getprefferedencoding and the encoding of the characters you type should be the same. -- http://mail.python.org/mailman/listinfo/python-list
Re: encoding of sys.argv ?
Marc 'BlackJack' Rintsch wrote: In [EMAIL PROTECTED], Jiba wrote: I am desperately searching for the encoding of sys.argv. I use a Linux box, with French UTF-8 locales and an UTF-8 filesystem. sys.getdefaultencoding() is ascii and sys.getfilesystemencoding() is utf-8. However, sys.argv is neither in ASCII (since I can pass French accentuated character), nor in UTF-8. It seems to be encoded in latin-1, but why ? There is no way to determine the encoding. The application that starts another and sets the arguments can use any encoding it likes and there's no standard way to find out which it was. There is standard way: nl_langinfo function http://www.opengroup.org/onlinepubs/009695399/functions/nl_langinfo.html The code in pythonrun.c properly uses it find out the encoding. The other question if Linux or *BSD distributions confirm to the standard. -- Leo. -- http://mail.python.org/mailman/listinfo/python-list
Re: Flexible Collating (feedback please)
Ron Adam wrote: Leo Kislov wrote: Ron Adam wrote: locale.setlocale(locale.LC_ALL, '') # use current locale settings It's not current locale settings, it's user's locale settings. Application can actually use something else and you will overwrite that. You can also affect (unexpectedly to the application) time.strftime() and C extensions. So you should move this call into the _test() function and put explanation into the documentation that application should call locale.setlocale I'll experiment with this a bit, I was under the impression that local.strxfrm needed the locale set for it to work correctly. Actually locale.strxfrm and all other functions in locale module work as designed: they work in C locale before the first call to locale.setlocale. This is by design, call to locale.setlocale should be done by an application, not by a 3rd party module like your collation module. Maybe it would be better to have two (or more) versions? A string, unicode, and locale version or maybe add an option to __init__ to choose the behavior? I don't think it should be two separate versions. Unicode support is only a matter of code like this: # in the constructor self.encoding = locale.getpreferredencoding() # class method def strxfrm(self, s): if type(s) is unicode: return locale.strxfrm(s.encode(self.encoding,'replace') return locale.strxfrm(s) and then instead of locale.strxfrm call self.strxfrm. And similar code for locale.atof This was the reason for using locale.strxfrm. It should let it work with unicode strings from what I could figure out from the documents. Am I missing something? strxfrm works only with byte strings encoded in the system encoding. -- Leo -- http://mail.python.org/mailman/listinfo/python-list
Re: comparing Unicode and string
[EMAIL PROTECTED] wrote: Thanks, John and Neil, for your explanations. Still I find it rather difficult to explain to a Python beginner why this error occurs. Suggestion: shouldn't an error raise already when I try to assign s2? A normal string should never be allowed to contain characters that are not codable using the system encoding. This test could be made at compile time and would render Python more didadic. This is impossible because of backward compatibility, your suggestion will break a lot of existing programs. The change is planned to happen in python 3.0 where it's ok to break backward compatibility if needed. -- Leo. -- http://mail.python.org/mailman/listinfo/python-list
Re: right curly quote and unicode
On 10/19/06, TiNo [EMAIL PROTECTED] wrote: Now I know where the problem lies. The character in the actual file path is u+00B4 (Acute accent) and in the Itunes library it is u+2019 (a right curly quote). Somehow Itunes manages to make these two the same...? As it is the only file that gave me trouble, I changed the accent in the file to an apostrophe and re-imported it in Itunes. But I would like to hear if there is a solution for this problem? I remember once I imported a russian mp3 violating tagging standard by encoding song name in windows-1251 encoding into itunes and itunes converted the name without even asking me into standard compliant utf-8. So there is some magic going on. In your case u+00B4 is a compatibility character from unicode.org point of view and they discourage usage of such characters. Perhaps itunes is eager to make u+00B4 character history as soon as possible. Googling for itunes replaces acute with quote reveals that char u+00B4 is not alone. Read the first hit. I'm afraid you will have to reverse engeneer what itunes is doing to some characters. -- Leo. -- http://mail.python.org/mailman/listinfo/python-list
Re: Flexible Collating (feedback please)
Ron Adam wrote: locale.setlocale(locale.LC_ALL, '') # use current locale settings It's not current locale settings, it's user's locale settings. Application can actually use something else and you will overwrite that. You can also affect (unexpectedly to the application) time.strftime() and C extensions. So you should move this call into the _test() function and put explanation into the documentation that application should call locale.setlocale self.numrex = re.compile(r'([\d\.]*|\D*)', re.LOCALE) [snip] if NUMERICAL in self.flags: slist = self.numrex.split(s) for i, x in enumerate(slist): try: slist[i] = float(x) except: slist[i] = locale.strxfrm(x) I think you should call locale.atof instead of float, since you call re.compile with re.LOCALE. Everything else looks fine. The biggest missing piece is support for unicode strings. -- Leo. -- http://mail.python.org/mailman/listinfo/python-list
Re: Type discrepancy using struct.unpack
Pieter Rautenbach wrote: Hallo, I have a 64 bit server with CentOS 4.3 installed, running Python. [EMAIL PROTECTED] pymsnt-0.11.2]$ uname -a Linux lutetium.mxit.co.za 2.6.9-34.ELsmp #1 SMP Thu Mar 9 06:23:23 GMT 2006 x86_64 x86_64 x86_64 GNU/Linux Consider the following two snippets, issuing a struct.unpack(...) using Python 2.3.4 and Python 2.5 respectively. [EMAIL PROTECTED] pymsnt-0.11.2]$ python Python 2.5 (r25:51908, Oct 17 2006, 10:34:59) [GCC 3.4.5 20051201 (Red Hat 3.4.5-2)] on linux2 Type help, copyright, credits or license for more information. import struct print type(struct.unpack(L, )[0]) type 'int' [EMAIL PROTECTED] pymsnt-0.11.2]$ /usr/bin/python2.3 Python 2.3.4 (#1, Feb 17 2005, 21:01:10) [GCC 3.4.3 20041212 (Red Hat 3.4.3-9.EL4)] on linux2 Type help, copyright, credits or license for more information. import struct print type(struct.unpack(L, )[0]) type 'long' I would expect type 'long' in both cases. Why is this not so? http://mail.python.org/pipermail/python-dev/2006-May/065199.html -- Leo. -- http://mail.python.org/mailman/listinfo/python-list
Re: right curly quote and unicode
On 10/17/06, TiNo [EMAIL PROTECTED] wrote: Hi all, I am trying to compare my Itunes Library xml to the actual files on my computer. As the xml file is in UTF-8 encoding, I decided to do the comparison of the filenames in that encoding. It all works, except with one file. It is named 'The Chemical Brothers-Elektrobank-04 - Don't Stop the Rock (Electronic Battle Weapon Version).mp3'. It goes wrong with the apostrophe in Don't. That is actually not an apostrophe, but ASCII char 180: ´ It's actually Unicode char #180, not ASCII. ASCII characters are in 0..127 range. In the Itunes library it is encoded as: Don%E2%80%99t Looks like a utf-8 encoded string, then encoded like an url. I do some some conversions with both the library path names and the folder path names. Here is the code: (in the comment I dispay how the Don't part looks. I got this using print repr(filename)) - #Once I have the filenames from the library I clean them using the following code (as filenames are in the format ' file://localhost/m:/music/track%20name.mp3') filename = urlparse.urlparse(filename)[2][1:] # u'Don%E2%80%99t' ; side question, anybody who nows a way to do this in a more fashionable way? filename = urllib.unquote (filename) # u'Don\xe2\x80\x99t' This doesn't work for me in python 2.4, unquote expects str type, not unicode. So it should be: filename = urllib.unquote(filename.encode('ascii')).decode('utf-8') filename = os.path.normpath(filename) # u'Don\xe2\x80\x99t' I get the files in my music folder with the os.walk method and then I do: filename = os.path.normpath(os.path.join (root,name)) # 'Don\x92t' filename = unicode(filename,'latin1') # u'Don\x92t' filename = filename.encode('utf-8') # 'Don\xc2\x92t' filename = unicode(filename,'latin1') # u'Don\xc2\x92t' This looks like calling random methods with random parameters :) Python is able to return you unicode file names right away, you just need to pass input parameters as unicode strings: os.listdir(u/) [u'alarm', u'ARCSOFT' ...] So in your case you need to make sure the start directory parameter for walk function is unicode. -- http://mail.python.org/mailman/listinfo/python-list
Re: characters in python
On Oct 18, 11:50 am, Stens [EMAIL PROTECTED] wrote: Stens wrote: Can python handle this characters: c,c,,d,? If can howI wanna to change some characters in text (in the file) to the characters at this address: http://rapidshare.de/files/37244252/Untitled-1_copy.png.html You need to use unicode, see any python unicode tutorial, for example this one http://www.amk.ca/python/howto/unicode or any other you can find with google. Your script can look like this: # -*- coding: PUT-HERE-ENCODING-OF-THIS-SCRIPT-FILE -*- import codecs outfile = codecs.open(your output file, w, encoding of the output file): for line in codecs.open(your input file, r, encoding of the input file): outfile.write(line.replace(u'd',u'd')) -- http://mail.python.org/mailman/listinfo/python-list
Re: characters in python
Leo Kislov wrote using google groups beta: On Oct 18, 11:50 am, Stens [EMAIL PROTECTED] wrote: Stens wrote: Can python handle this characters: c,c,,d,? [snip] outfile.write(line.replace(u'd',u'd')) I hope you'll do better than google engeers who mess up croatian characters in new google groups. Of course the last 'd' should be latin d with stroke. I really typed it but google swallowed the stroke. -- http://mail.python.org/mailman/listinfo/python-list
Re: codecs.EncodedFile
Neil Cerutti wrote: It turns out to be troublesome for my case because the EncodedFile object translates calls to readline into calls to read. I believe it ought to raise a NotImplemented exception when readline is called. As it is it silently causes interactive applications to apparently hang forever, and breaks the line-buffering expectation of non-interactive applications. Does it work if stdin is a pipe? If it works then raising NotImplemented doesn't make sense. If raising the exception is too much to ask, then at least it should be documented better. Improving documentation is always a good idea. Meanwhile see my solution how to make readline method work: http://groups.google.com/group/comp.lang.python/msg/f1267dc612314657 -- http://mail.python.org/mailman/listinfo/python-list
Re: How to send E-mail without an external SMTP server ?
On Oct 15, 10:25 pm, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, I just want to send a very simple email from within python. I think the standard module of smtpd in python can do this, but I haven't found documents about how to use it after googleing. Are there any examples of using smtpd ? I'm not an expert,so I need some examples to learn how to use it. smtpd is for relaying mail not for sending. What you need it a dns toolkit (search cheeseshop) to map domain name to list of incoming mail servers, and then using stdlib smtplib try to submit the message to them. Or maybe there is a better way to to this? This won't work if you're behind a strict corporate firewall or if ISP is blocking port 25 outgoing connections. In those cases you _have_ to use an external mail server. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to send E-mail without an external SMTP server ?
On Oct 16, 12:31 am, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Rob Wolfe wrote: [EMAIL PROTECTED] wrote: Hi, I just want to send a very simple email from within python. I think the standard module of smtpd in python can do this, but I haven't found documents about how to use it after googleing. Are there any examples of using smtpd ? I'm not an expert,so I need some examples to learn how to use it. See standard documentation: http://docs.python.org/lib/SMTP-example.html HTH, RobI have read the example and copied the code and save as send.py, then I run it. Here is the output: $ python send.py From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Enter message, end with ^D (Unix) or ^Z (Windows): just a test from localhost Message length is 82 send: 'ehlo [202.127.19.74]\r\n' reply: '250-WebMail\r\n' reply: '250 AUTH plain\r\n' reply: retcode (250); Msg: WebMail AUTH plain send: 'mail FROM:[EMAIL PROTECTED]\r\n' reply: '502 negative vibes\r\n' reply: retcode (502); Msg: negative vibes send: 'rset\r\n' reply: '502 negative vibes\r\n' reply: retcode (502); Msg: negative vibes Traceback (most recent call last): File send.py, line 26, in ? server.sendmail(fromaddr, toaddrs, msg) File /usr/lib/python2.4/smtplib.py, line 680, in sendmail raise SMTPSenderRefused(code, resp, from_addr) smtplib.SMTPSenderRefused: (502, 'negative vibes', '[EMAIL PROTECTED]') Do I have to setup a smtp server on my localhost ? You need to use login method http://docs.python.org/lib/SMTP-objects.html. And by the way, the subject of your message is very confusing, you are posting log where you're sending email using external server. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to send E-mail without an external SMTP server ?
On Oct 16, 2:04 am, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: It's not safe if I have to use login method explicitly by which I have to put my username and password in the script. I have also tried the Unix command 'mail', but without success, either. I could use 'mail' to send an E-mail to the user on the server, but I couldn't send an E-mail to an external E-mail server. I realized that it may because the port 25 outgoing connections are blocked, so I gave up. I will have to login periodically to check the status of the jobs:-( Using username password is safe as long as you trust system admin, you just need to make your script readable only to you. Or even better put the username and password in a separate file. There is also a way to limit damage in case you don't trust admin, you just need to get auth token. Start smtp session and set debug level(True), use login method and see the token: send: 'AUTH PLAIN HERE IS THE TOKEN\r\n' reply: '235 2.7.0 Accepted\r\n' reply: retcode (235); Msg: 2.7.0 Accepted Then put the token in a file readable only to you, and from now on instead of login() method use docmd('AUTH PLAIN',YOUR TOKEN FROM FILE). If the token is stolen, the thief can only send mail from your account but won't be able to login with password. -- http://mail.python.org/mailman/listinfo/python-list
Re: Alphabetical sorts
On Oct 16, 2:39 pm, Tuomas [EMAIL PROTECTED] wrote: My application needs to handle different language sorts. Do you know a way to apply strxfrm dynamically i.e. without setting the locale? Collation is almost always locale dependant. So you have to set locale. One day I needed collation that worked on Windows and Linux. It's not that polished and not that tested but it worked for me: import locale, os, codecs current_encoding = 'ascii' current_locale = '' def get_collate_encoding(s): '''Grab character encoding from locale name''' split_name = s.split('.') if len(split_name) != 2: return 'ascii' encoding = split_name[1] if os.name == nt: encoding = 'cp' + encoding try: codecs.lookup(encoding) return encoding except LookupError: return 'ascii' def setup_locale(locale_name): '''Switch to new collation locale or do nothing if locale is the same''' global current_locale, current_encoding if current_locale == locale_name: return current_encoding = get_collate_encoding( locale.setlocale(locale.LC_COLLATE, locale_name)) current_locale = locale_name def collate_key(s): '''Return collation weight of a string''' return locale.strxfrm(s.encode(current_encoding, 'ignore')) def collate(lst, locale_name): '''Sort a list of unicode strings according to locale rules. Locale is specified as 2 letter code''' setup_locale(locale_name) return sorted(lst, key = collate_key) words = u'c ch f'.split() print ' '.join(collate(words, 'en')) print ' '.join(collate(words, 'cz')) Prints: c ch f c f ch -- http://mail.python.org/mailman/listinfo/python-list
Re: Need a Regular expression to remove a char for Unicode text
On Oct 13, 4:44 am, [EMAIL PROTECTED] wrote: శ్రీనివాస wrote: Hai friends, Can any one tell me how can i remove a character from a unocode text. కల్హార is a Telugu word in Unicode. Here i want to remove '' but not replace with a zero width char. And one more thing, if any whitespaces are there before and after '' char, the text should be kept as it is. Please tell me how can i workout this with regular expressions. Thanks and regards Srinivasa Raju DatlaDon't know anything about Telugu, but is this the approach you want? x=u'\xfe\xff \xfe\xff \xfe\xff\xfe\xff' noampre = re.compile('(?!\s)(?!\s)', re.UNICODE).sub noampre('', x) He wants to replace with zero width joiner so the last call should be noampre(u\u200D, x) -- http://mail.python.org/mailman/listinfo/python-list
Re: Need a Regular expression to remove a char for Unicode text
On Oct 13, 4:55 am, Leo Kislov [EMAIL PROTECTED] wrote: On Oct 13, 4:44 am, [EMAIL PROTECTED] wrote: శ్రీనివాస wrote: Hai friends, Can any one tell me how can i remove a character from a unocode text. కల్హార is a Telugu word in Unicode. Here i want to remove '' but not replace with a zero width char. And one more thing, if any whitespaces are there before and after '' char, the text should be kept as it is. Please tell me how can i workout this with regular expressions. Thanks and regards Srinivasa Raju DatlaDon't know anything about Telugu, but is this the approach you want? x=u'\xfe\xff \xfe\xff \xfe\xff\xfe\xff' noampre = re.compile('(?!\s)(?!\s)', re.UNICODE).sub noampre('', x) He wants to replace with zero width joiner so the last call should be noampre(u\u200D, x) Pardon my poor reading comprehension, OP doesn't want zero width joiner. Though I'm confused why he mentioned it at all. -- http://mail.python.org/mailman/listinfo/python-list
Re: does raw_input() return unicode?
Theerasak Photha wrote: On 10/10/06, Martin v. Löwis [EMAIL PROTECTED] wrote: Theerasak Photha schrieb: At the moment, it only returns unicode objects when invoked in the IDLE shell, and only if the character entered cannot be represented in the locale's charset. Why only IDLE? Does urwid or another console UI toolkit avoid this somehow? I admit I don't know what urwid is; from a shallow description I find (a console user interface library) I can't see the connection to raw_input(). How would raw_input() ever use urwid? The other way around: would urwid use raw_input() or other Python input functions anywhere? And what causes Unicode input to work in IDLE alone? Other applications except python are actually free to implement unicode stdin. python cannot do it because of backward compatibility. You can argue that python interactive console could do it too, but think about it this way: python interactive console deliberately behaves like a running python program would. -- http://mail.python.org/mailman/listinfo/python-list
Re: does raw_input() return unicode?
Duncan Booth wrote: Stuart McGraw [EMAIL PROTECTED] wrote: So, does raw_input() ever return unicode objects and if so, under what conditions? It returns unicode if reading from sys.stdin returns unicode. Unfortunately, I can't tell you how to make sys.stdin return unicode for use with raw_input. I tried what I thought should work and as you can see it messed up the buffering on stdin. Does anyone else know how to wrap sys.stdin so it returns unicode but is still unbuffered? Considering that all consoles are ascii based, the following should work where python was able to determine terminal encoding: class ustdio(object): def __init__(self, stream): self.stream = stream self.encoding = stream.encoding def readline(self): return self.stream.readline().decode(self.encoding) sys.stdin = ustdio(sys.stdin) answer = raw_input() print type(answer) -- http://mail.python.org/mailman/listinfo/python-list
Re: People's names (was Re: sqlite3 error)
John J. Lee wrote: Steve Holden [EMAIL PROTECTED] writes: [...] There would also need to be a flag field to indicate the canonical ordering for writing out the full name: e.g. family-name-first, given-names-first. Do we need something else for the Vietnamese case? You'd think some standards body would have worked on this, wouldn't you. I couldn't think of a Google search string that would lead to such information, though. Maybe other, more determined, readers can do better. I suppose very few projects actually deal with more than a handful of languages or cultures, but it does surprise me how hard it is to find out about this kind of thing -- especially given that open source projects often end up with all kinds of weird and wonderful localised versions. On a project that involved 9 localisations, just trying to find information on the web about standard collation of diacritics (accented characters) in English, German, and Scandinavian languages was more difficult than I'd expected. As far as I understand unicode.org has become the central(?) source of locale information: http://unicode.org/cldr/Did you use it? -- http://mail.python.org/mailman/listinfo/python-list
Re: How to find number of characters in a unicode string?
Lawrence D'Oliveiro wrote: In message [EMAIL PROTECTED], Marc 'BlackJack' Rintsch wrote: In [EMAIL PROTECTED], Preben Randhol wrote: Is there a way to calculate in characters and not in bytes to represent the characters. Decode the byte string and use `len()` on the unicode string. Hmmm, for some reason len(uC\u0327) returns 2. If python ever provide this functionality it would be I guess uC\u0327.width() == 1. But it's not clear when unicode.org will provide recommended fixed font character width information for *all* characters. I recently stumbled upon Tamil language, where for example u'\u0b95\u0bcd', u'\u0b95\u0bbe', u'\u0b95\u0bca', u'\u0b95\u0bcc' looks like they have width 1,2,3 and 4 columns. To add insult to injury these 4 symbols are all considered *single* letter symbols :) If your email reader is able to show them, here they are in all their glory: க், கா, கொ, கௌ. -- http://mail.python.org/mailman/listinfo/python-list