Re: Error on base64.b64decode() ?!
On 12 Okt., 17:09, Jean-Paul Calderone [EMAIL PROTECTED] wrote: If you get an incorrect padding error, try appending a = and decoding again. If you get the error again, try appending one more =. If it still doesn't work, then you might be out of luck. This seems to work in some cases, but not all. Whats the deal with this adding of =? Is there an implementation error in python, or are other implemenations of base64 more robust then they have to be? Christoph -- http://mail.python.org/mailman/listinfo/python-list
Error on base64.b64decode() ?!
Hello everybody, I am using a python script to extract images from email messages. This works fine for some messages, but not all attached images can be decoded. I use the following code to decode the image and save it to a database: try: imagedec = base64.b64decode(imageenc) imagehash = md5.new(imagedec).hexdigest() dbcurs.execute(save image) except TypeError, e: print Error '%s' in Message %s % (e, dbrow[1]) print imageenc) The problem is that for some images, the method b64decode() returns a TypeError Incorrect Padding. But if I paste the content of imageenc to some other base64 decode (like this one: http://www.php-einfach.de/base64_generator.php?code=1), I get the correct result. One example content is included below. Is this a bug in the base64 module, or is there something wrong with my code? Regards, Christoph --- R0lGODlhOwKDAfcAAICAAICAgIAAgACAgICAgMDAwP8AAAD/AP//// 8A/wD/ / wAA MwAAZgAAmQAAzAAA/ wAzAAAzMwAzZgAzmQAzzAAz/wBm AABmMwBmZgBmmQBmzABm/wCZAACZMwCZZgCZmQCZzACZ/wDMAADMMwDMZgDMmQDMzADM/ wD/AAD/ MwD/ZgD/mQD/zAD//zMAADMAMzMAZjMAmTMAzDMA/zMzADMzMzMzZjMzmTMzzDMz/ zNmADNmMzNm ZjNmmTNmzDNm/zOZADOZMzOZZjOZmTOZzDOZ/zPMADPMMzPMZjPMmTPMzDPM/zP/ADP/ MzP/ZjP/ mTP/zDP//2YAAGYAM2YAZmYAmWYAzGYA/2YzAGYzM2YzZmYzmWYzzGYz/ 2ZmAGZmM2ZmZmZmmWZm zGZm/2aZAGaZM2aZZmaZmWaZzGaZ/2bMAGbMM2bMZmbMmWbMzGbM/2b/AGb/M2b/Zmb/ mWb/zGb/ /5kAAJkAM5kAZpkAmZkAzJkA/5kzAJkzM5kzZpkzmZkzzJkz/ 5lmAJlmM5lmZplmmZlmzJlm/5mZ AJmZM5mZZpmZmZmZzJmZ/5nMAJnMM5nMZpnMmZnMzJnM/5n/AJn/M5n/Zpn/mZn/zJn// 8wAAMwA M8wAZswAmcwAzMwA/8wzAMwzM8wzZswzmcwzzMwz/8xmAMxmM8xmZsxmmcxmzMxm/ 8yZAMyZM8yZ ZsyZmcyZzMyZ/8zMAMzMM8zMZszMmczMzMzM/8z/AMz/M8z/Zsz/mcz/zMz///8AAP8AM/ 8AZv8A mf8AzP8A//8zAP8zM/8zZv8zmf8zzP8z//9mAP9mM/9mZv9mmf9mzP9m//+ZAP+ZM/+ZZv +Zmf+Z zP+Z///MAP/MM//MZv/Mmf/MzP/MAP//M///Zv//mf//zP/// yH5BAEAABAALAA7AoMB AAj/ ANFpwoJCCi5NKOww4cZEnCZxUqTAEedLHApfLviIs4OLGyWBvqRo0sSH40BfCjXZ8fWQmx1x xDTREOkLFJOKKimKg +NSEyWcFVna0QTHjgtKK6X4cskThThcvkJSYmJQHBZxLhriknJTylCIxFhy 06RUCjpcfBLCISYF1MijlIhxVIoiLFSJGL3agSOFkjhKEnFRIiuS6M6tf8+SRcdkokWUxFz8ZeJC qYuRG4syscONJS4XbkWC +tlQ02aRESty3CvFqiZcuNCJAyUlrB0pfJhQguMith1QE32h88WELERK WJaqHMpNIlfD3CD6goOSIhbAL3E99bpwb9TKU0G5/5h +fDNVXCgOonArXbcdLENV9lY5UiptYhP9 MkGBovFS2EZhtZhwQ31F0Wt8ZDWUbg/ xFVdhjXE13kEx8bGVHZSgBRETxOw2mFJwkMVRSRCpNJVX f2kSkna+0LBSekwkuNlwGNph41Y/ RUUVCuhI0Vxf6KxEUWkUCfbTT2S9JkVWKMHRozibOQVHV1CF aB5ZoIx1WXVLyoWaOFBSZCMWRY20Uo7T1eeLYAqxRVtaQ1FSUIhehWjmTVn1BcdFKBCF0V5QUrZS SAbtxY0L3OBSY5+aMGRTk0xEupQdKCTKmG4SDfaanSoi16hSg8LB25pIRbfRQZQhhYKcayqqyXgv Ff/ 2GqUqdcZRpboRBxt1XK10G0Go6RbXRQZl1NiUR930EmBHUSdffTf5ZBActPkC2Ge4UMcRlJ1J plKfjUW6H6VL4lJcQiPtWZu4je6JYUHmGmWtRNxQV1OVT/EkBbqU +KWJXNFNR0lzPxXkAnNU+SVY Qs16Ve++fAWnqVeXYaGiJqNBGRFRu +HFVIZEUZXgoSk6tZlkt10LClSVRbkRS3jd5qPCPl1GGWWK ZjuaTwIKBy9GU+7lQkaDdaYUh9mSlltkQ +npwqqRcbUmQhmO1SOGjblolLmhbcbsmgyZ9tpOrFFi YUp+cYNOQoOVZpRpcKiHUn/PfYfeZ/ GBKepClU3/5CBgO7Ekp0USwTFVcYgnV91LzvrSZ1QBjtbh RjvRRrhZGQYdEkUiYQgzhmbjQlCAU7bVFl96nbUkRkSxNbZIPDGBhc3ccINFc6sxxBKY9FLy9Fco vQpRZ2tF5epQ6fGxZkJo7dURcqI6pfdLjUXHl8XEzSq7WSH1RpVFa/ G03F8i3XQWhsLjAkpFoDm0 1ZLUdYbuQW5VthRXIcr5aqKWCYYXH2ThC1mK06+LII5QS9rLZgyHAsON5EOxYQKAolKQiKCABlvJ n4puA5E4iYMPrdnKa/rmnZUEx0ZmEkm2ItUa0BhkKyiZTl0mksG +zG5NqzkY9C5mPEL9hCK9Gdpv / yISkZ9kyywowAKIDGeQkYSFMEXxyL6A1sBXpeQ2vUGNtlQCFSShw0OBUZFWctMbjKwpJHGjiF/ I VZSpVEkwphkRlDD2EGtJ8FUXyY8AEzYv4hANMF4ZXEea1CGPnIUnT5TZ4AjmkIgoRGi +IAhbJPKP SlrykpjMpCY3yclOevKToAylKEdJylKa8pSoTKUqV8nKVrrylbCMpSxnSctL9gkpYKqlLnfJy176 8pfADKYwh0nMYhKzOBXhijGXycxmOvOZ0IymNKfZTN5QhSzUzKY2t8nNbtISBd4MpzHrVStciPOc 6EynOtfJznaSclQUgYM750nPetrznvik5bbgUP+SfPrzn/ 8AJ0AHSlB07udpWymoQhfK0IY6tJZL 8g8lHkrRilr0ohaljl +shdGOepSZAv2oSEsZKEXJc6Qo7WZIU8rSgXZxJr5oqUxnStOashIdIbqM C2zK05769KeRMttmfkrUi660qJY8qkxdEJyhIfWpUK2oUmXqCwCqRApRzapWCzpVY3aVmU8j3ES3 Stay2vOrI8XOrMzK1ramE60jDQlOEeLWutp1m3D96Fq6iIW7gpIGOqCBMQF7ScJiErCCxWRgdZBJ xGZysY0N7GMNW1jJKpaylnRsKRG7WM4mVpOe7awoPftXy2YWsqbU7CVRq0nWVlK0qcxrR00zHeL/ +PWTrt3kZy3J2Faa9h+/fS1mgcvZy/ a2srv9B2mRO9njCje3uQ1lZ6c73NWGVrWcpG50jYtc55Iy utg97G9hi0rZdhQ2vugQVm/ bSR0EN7KX9e1u3btJ17oXsM7dLnDHe9/8vve19D3tfVcb4M26d7H9 9W5lE4zg9jIYvwom8GchnMoBc7e+lD1wcktp3o72yI8drqt +K4ndEZMSs9Ul8XiLe9oNP7e7802x clfs2vCa0sS8Ha6JWaziFIOTtTjGbXVt3OIis9dJhAIFezupWQ0LGLIItiyUbQzdGGP4uFPGsozx W2QLzzjCxMUyfYE83BD3uLTJ5bFiwQtmFfc2/8if5PKa24xaL98WCzfZDzeWnN3rSpm0oe2xdpmb Yy2DWbWORfR/w1xoRr8YtKZNdKQXPUo4s9nFjv4uYeEsZAUT +cyfrutteLQTPtcXyBa2L38nnV/M FnjGTA7wgff7Zko7mstynnV2/8xY1MpZlb +OtYJ1Hdk2OxjCMq50lJFNZ2Ybm60iecraTN1aHRu2 xCgObqiJa932bjrD3za2pCcs2W1nudGc9qSlh7zodFuXuqvU7nIhHWhWmrmi/ LTRbagd2TTzmtCP bjQnsx1rwSra4LY+94t3XO5JIxzTJ7a1kd +tW4lXfNzAlveIp5tpvzJ1bQ3k94UFjm3nhjfUBP/3 dq/BvXJxN5zcD1d5phWuynV7ut3JVnli3V3tm0Ocxds2656iFFOR8za4qXb1que8626ru7/ GtTOk pc5gdVOYwA1eZbAH/moAH5rYRz821iv83o0rHeJmNUpyNmJ0gZOcxgA/ c8UrG2fyChrtz7V2kOd9 d/k+O+CA77fJze5dnlNcvD/XtsW1uhLY9Kntcjcy2IOddeVe8PDcDrvVwU7rnAP45Ha/ st67jng0
Re: MemoryError on reading mbox file
On 12 Sep., 16:39, Istvan Albert [EMAIL PROTECTED] wrote: This line reads an entire message into memory as a string. Is it possible that you have a huge email in there (hundreds of MB) with some attachment encoded as text? No, the largest single message with the mbox is about 100KB large. For now I would recommend that you split your mbox file into several smaller ones. (I think all you need is to split at the To: fields) and run your script on these individual files. I get it to work with splitting the mbox file into single files, one for each message, with the git-mailsplit tool, that is included in the gitk package. This solved the problem for now. Thanks for all your help. Christoph -- http://mail.python.org/mailman/listinfo/python-list
MemoryError on reading mbox file
Hello everybody, I have to convert a huge mbox file (~1.5G) to MySQL. I tried with the following simple code: for m in mailbox.mbox(fileName): msg = m.as_string(True) hash = md5.new(msg).hexdigest() try: dbcurs.execute(INSERT INTO archive (hash, msg) VALUES (%s, %s), (hash, msg)) except MySQLdb.OperationalError, err: print %s Error (%d): %s % (file, err[0], err[1]) else: print %s: Message successfully added to database % (hash, spamSource) The problem seems to be the size of file, every time I try to execute the script, after about 2 messages, the following error occurs: Traceback (most recent call last): File email_to_mysql_mbox.py, line 21, in module for m in mailbox.mbox(fileName): File /usr/lib/python2.5/mailbox.py, line 98, in itervalues value = self[key] File /usr/lib/python2.5/mailbox.py, line 70, in __getitem__ return self.get_message(key) File /usr/lib/python2.5/mailbox.py, line 633, in get_message string = self._file.read(stop - self._file.tell()) MemoryError My system has 512M RAM and 768M swap, which seems to run out at an early stage of this. Is there a way to clean up memory for messages already processed? Thanks and regards, Christoph -- http://mail.python.org/mailman/listinfo/python-list
Re: MemoryError on reading mbox file
On 12 Sep., 12:20, David [EMAIL PROTECTED] wrote: It may be that Python's garbage collection isn't keeping up with your app. You could try periodically forcing it to run. eg: import gc gc.collect() I tried this, but the problem is not solved. When invoking the garbage collection after every loop run, the amount of memory indicated by top stays the same for a very long time until at some point (at different messages), while it is executing the loop header, the memory increases until it hits 100% and swap hit also 100% = MemoryError Can there be a problem within the mailbox module while processing too large files? Regards, Christoph -- http://mail.python.org/mailman/listinfo/python-list
re.sub does not replace all occurences
Hello everybody, I wanted to use re.sub to strip all HTML tags out of a given string. I learned that there are better ways to do this without the re module, but I would like to know why my code is not working. I use the following: def stripHtml(source): source = re.sub([\n\r\f], , source) source = re.sub(.*?, , source, re.S | re.I | re.M) source = re.sub((#[0-9]{1,3}|[a-z]{3,6});, , source, re.I) return source But the result still has some tags in it. When I call the second line multiple times, all tags disappear, but since HTML tags cannot be overlapping, I do not understand this behavior. There is even a difference when I omit the re.I (IGNORECASE) option. Without this option, some tags containing only capital letters (like /FONT) were kept in the string when doing one processing run but removed when doing multiple runs. Perhaps anyone can tell me why this regex is behaving like this. Thanks and regards, Christoph -- http://mail.python.org/mailman/listinfo/python-list
Re: re.sub does not replace all occurences
Neil Cerutti schrieb: In other words, the fourth argument to sub is count, not a set of re flags. I knew it had to be something very stupid. Thanks a lot. -- http://mail.python.org/mailman/listinfo/python-list
Read binary data from MySQL database
Hello, I try to write a python application with wx that shows images from a MySQL database. I use the following code to connect and get data when some event was triggered: dbconn = MySQLdb.connect(host=localhost, user=..., passwd=..., db=images) dbcurs = dbconn.cursor() dbcurs.execute(SELECT imgdata FROM images LIMIT 1) imgstring = dbcurs.fetchone()[0] frame.showImage(imgstring) Within my frame, the following method is defined: def showImage(self, imgstring): imgdata = StringIO.StringIO() imgdata.write(imgstring) print imgdata.getvalue() wx.ImageFromStream(imgdata, wx.BITMAP_TYPE_GIF) panel = wx.Panel(self, -1) self.panel = panel But this does not work. The converter says that the data is not valid GIF. When I print the content of imgstring after the database select statement, it contains something like this: array('c', 'GIF89aL\x01=\x01\x85\x00\x00\x00\x00\x00\xff\xff\xff \x00\xff\xff\xff[...]\x00\x00;') When I try to print imgstring[1], the result is I. So I don't quite get what this print result is about and why my input should not be valid. The data in the database is correct, I can restore the image with tools like the MySQL Query Browser. Thanks in advance, Christoph -- http://mail.python.org/mailman/listinfo/python-list
Broken pipe with os.popen3()
Hello everybody, I try to use an external OCR tool to convert some binary image data to text. The image is in one variable, the text should be converted to another. I use the following code: (si, so, se) = os.popen3('ocrad') si.write(frame) si.close() messagetext += so.read() This code leads to a broken pipe error. I think this is because of the command already writing data to stdout after getting the first part of the input. But when I change the order of the code lines, i.e. opening the reading pipe so before writing to si, the program hangs, because no data is written to stdout before the first bytes are written to stdin. Any idea how to solve this issue? How do I read and write simultaneously? Thanks in advance, Christoph -- http://mail.python.org/mailman/listinfo/python-list
Using os.popen3() to get binary data
Hello everybody, I need to get the different frames from a GIF image in my python script and want to use the giftopnm program from netpbm to get the frames and directly convert them to pnm files. I tried to use the following code: for image in images: if (image[0:3] == 'GIF'): (si, so, se) = os.popen3('giftopnm -image=all', 'b') si.write(image) frame = so.readlines() But with this code the script just hangs. When I interrupt the script, I get the following error message: Traceback (most recent call last): File /home/tiger/stock-spam/scripts/all_in_one.py, line 46, in ? frames = so.readlines() KeyboardInterrupt close failed: [Errno 32] Broken pipe Can somebody tell me, which command I have to use that the pipe will be closed when the giftopnm returns? This program just prints the converted images to stdout and terminates. Thanks in advance, Christoph -- http://mail.python.org/mailman/listinfo/python-list
Re: Using os.popen3() to get binary data
Just got the solution... After sending the image data with si.write(image), I have to close the pipe to tell the program to convert the image with si.close(). Now everything works fine. Christoph -- http://mail.python.org/mailman/listinfo/python-list
How to access multiple group matches?
Hello, I want to use the re module to split a data stream that consists of several blocks of data. I use the following code: iter = re.finditer('^(HEADER\n.*)+$', data) The data variable contains binary data that has the word HEADER in it in some places and binary data after this word till the next appearance of header or the end of the file. But if I iterate over iter, I only get one match and this match only contains one group. How to access the other matches? Data may contain tens of them. Thanks in advance, Christoph -- http://mail.python.org/mailman/listinfo/python-list