Re: Error on base64.b64decode() ?!

2007-10-13 Thread Christoph Krammer
On 12 Okt., 17:09, Jean-Paul Calderone [EMAIL PROTECTED] wrote:
 If you get an incorrect padding error, try appending a = and decoding
 again.  If you get the error again, try appending one more =.  If it
 still doesn't work, then you might be out of luck.

This seems to work in some cases, but not all. Whats the deal with
this adding of =? Is there an implementation error in python, or are
other implemenations of base64 more robust then they have to be?

Christoph


-- 
http://mail.python.org/mailman/listinfo/python-list


Error on base64.b64decode() ?!

2007-10-12 Thread Christoph Krammer
Hello everybody,

I am using a python script to extract images from email messages. This
works fine for some messages, but not all attached images can be
decoded. I use the following code to decode the image and save it to a
database:

try:
  imagedec = base64.b64decode(imageenc)
  imagehash = md5.new(imagedec).hexdigest()
  dbcurs.execute(save image)
except TypeError, e:
  print Error '%s' in Message %s % (e, dbrow[1])
  print imageenc)

The problem is that for some images, the method b64decode() returns a
TypeError Incorrect Padding. But if I paste the content of imageenc
to some other base64 decode (like this one:
http://www.php-einfach.de/base64_generator.php?code=1), I get the
correct result.

One example content is included below.

Is this a bug in the base64 module, or is there something wrong with
my code?

Regards,
 Christoph

---
R0lGODlhOwKDAfcAAICAAICAgIAAgACAgICAgMDAwP8AAAD/AP////
8A/wD/
/
wAA
MwAAZgAAmQAAzAAA/
wAzAAAzMwAzZgAzmQAzzAAz/wBm
AABmMwBmZgBmmQBmzABm/wCZAACZMwCZZgCZmQCZzACZ/wDMAADMMwDMZgDMmQDMzADM/
wD/AAD/
MwD/ZgD/mQD/zAD//zMAADMAMzMAZjMAmTMAzDMA/zMzADMzMzMzZjMzmTMzzDMz/
zNmADNmMzNm
ZjNmmTNmzDNm/zOZADOZMzOZZjOZmTOZzDOZ/zPMADPMMzPMZjPMmTPMzDPM/zP/ADP/
MzP/ZjP/
mTP/zDP//2YAAGYAM2YAZmYAmWYAzGYA/2YzAGYzM2YzZmYzmWYzzGYz/
2ZmAGZmM2ZmZmZmmWZm
zGZm/2aZAGaZM2aZZmaZmWaZzGaZ/2bMAGbMM2bMZmbMmWbMzGbM/2b/AGb/M2b/Zmb/
mWb/zGb/
/5kAAJkAM5kAZpkAmZkAzJkA/5kzAJkzM5kzZpkzmZkzzJkz/
5lmAJlmM5lmZplmmZlmzJlm/5mZ
AJmZM5mZZpmZmZmZzJmZ/5nMAJnMM5nMZpnMmZnMzJnM/5n/AJn/M5n/Zpn/mZn/zJn//
8wAAMwA
M8wAZswAmcwAzMwA/8wzAMwzM8wzZswzmcwzzMwz/8xmAMxmM8xmZsxmmcxmzMxm/
8yZAMyZM8yZ
ZsyZmcyZzMyZ/8zMAMzMM8zMZszMmczMzMzM/8z/AMz/M8z/Zsz/mcz/zMz///8AAP8AM/
8AZv8A
mf8AzP8A//8zAP8zM/8zZv8zmf8zzP8z//9mAP9mM/9mZv9mmf9mzP9m//+ZAP+ZM/+ZZv
+Zmf+Z
zP+Z///MAP/MM//MZv/Mmf/MzP/MAP//M///Zv//mf//zP///
yH5BAEAABAALAA7AoMB
AAj/
ANFpwoJCCi5NKOww4cZEnCZxUqTAEedLHApfLviIs4OLGyWBvqRo0sSH40BfCjXZ8fWQmx1x
xDTREOkLFJOKKimKg
+NSEyWcFVna0QTHjgtKK6X4cskThThcvkJSYmJQHBZxLhriknJTylCIxFhy
06RUCjpcfBLCISYF1MijlIhxVIoiLFSJGL3agSOFkjhKEnFRIiuS6M6tf8+SRcdkokWUxFz8ZeJC
qYuRG4syscONJS4XbkWC
+tlQ02aRESty3CvFqiZcuNCJAyUlrB0pfJhQguMith1QE32h88WELERK
WJaqHMpNIlfD3CD6goOSIhbAL3E99bpwb9TKU0G5/5h
+fDNVXCgOonArXbcdLENV9lY5UiptYhP9
MkGBovFS2EZhtZhwQ31F0Wt8ZDWUbg/
xFVdhjXE13kEx8bGVHZSgBRETxOw2mFJwkMVRSRCpNJVX
f2kSkna+0LBSekwkuNlwGNph41Y/
RUUVCuhI0Vxf6KxEUWkUCfbTT2S9JkVWKMHRozibOQVHV1CF
aB5ZoIx1WXVLyoWaOFBSZCMWRY20Uo7T1eeLYAqxRVtaQ1FSUIhehWjmTVn1BcdFKBCF0V5QUrZS
SAbtxY0L3OBSY5+aMGRTk0xEupQdKCTKmG4SDfaanSoi16hSg8LB25pIRbfRQZQhhYKcayqqyXgv
Ff/
2GqUqdcZRpboRBxt1XK10G0Go6RbXRQZl1NiUR930EmBHUSdffTf5ZBActPkC2Ge4UMcRlJ1J
plKfjUW6H6VL4lJcQiPtWZu4je6JYUHmGmWtRNxQV1OVT/EkBbqU
+KWJXNFNR0lzPxXkAnNU+SVY
Qs16Ve++fAWnqVeXYaGiJqNBGRFRu
+HFVIZEUZXgoSk6tZlkt10LClSVRbkRS3jd5qPCPl1GGWWK
ZjuaTwIKBy9GU+7lQkaDdaYUh9mSlltkQ
+npwqqRcbUmQhmO1SOGjblolLmhbcbsmgyZ9tpOrFFi
YUp+cYNOQoOVZpRpcKiHUn/PfYfeZ/
GBKepClU3/5CBgO7Ekp0USwTFVcYgnV91LzvrSZ1QBjtbh
RjvRRrhZGQYdEkUiYQgzhmbjQlCAU7bVFl96nbUkRkSxNbZIPDGBhc3ccINFc6sxxBKY9FLy9Fco
vQpRZ2tF5epQ6fGxZkJo7dURcqI6pfdLjUXHl8XEzSq7WSH1RpVFa/
G03F8i3XQWhsLjAkpFoDm0
1ZLUdYbuQW5VthRXIcr5aqKWCYYXH2ThC1mK06+LII5QS9rLZgyHAsON5EOxYQKAolKQiKCABlvJ
n4puA5E4iYMPrdnKa/rmnZUEx0ZmEkm2ItUa0BhkKyiZTl0mksG
+zG5NqzkY9C5mPEL9hCK9Gdpv
/
yISkZ9kyywowAKIDGeQkYSFMEXxyL6A1sBXpeQ2vUGNtlQCFSShw0OBUZFWctMbjKwpJHGjiF/
I
VZSpVEkwphkRlDD2EGtJ8FUXyY8AEzYv4hANMF4ZXEea1CGPnIUnT5TZ4AjmkIgoRGi
+IAhbJPKP
SlrykpjMpCY3yclOevKToAylKEdJylKa8pSoTKUqV8nKVrrylbCMpSxnSctL9gkpYKqlLnfJy176
8pfADKYwh0nMYhKzOBXhijGXycxmOvOZ0IymNKfZTN5QhSzUzKY2t8nNbtISBd4MpzHrVStciPOc
6EynOtfJznaSclQUgYM750nPetrznvik5bbgUP+SfPrzn/
8AJ0AHSlB07udpWymoQhfK0IY6tJZL
8g8lHkrRilr0ohaljl
+shdGOepSZAv2oSEsZKEXJc6Qo7WZIU8rSgXZxJr5oqUxnStOashIdIbqM
C2zK05769KeRMttmfkrUi660qJY8qkxdEJyhIfWpUK2oUmXqCwCqRApRzapWCzpVY3aVmU8j3ES3
Stay2vOrI8XOrMzK1ramE60jDQlOEeLWutp1m3D96Fq6iIW7gpIGOqCBMQF7ScJiErCCxWRgdZBJ
xGZysY0N7GMNW1jJKpaylnRsKRG7WM4mVpOe7awoPftXy2YWsqbU7CVRq0nWVlK0qcxrR00zHeL/
+PWTrt3kZy3J2Faa9h+/fS1mgcvZy/
a2srv9B2mRO9njCje3uQ1lZ6c73NWGVrWcpG50jYtc55Iy
utg97G9hi0rZdhQ2vugQVm/
bSR0EN7KX9e1u3btJ17oXsM7dLnDHe9/8vve19D3tfVcb4M26d7H9
9W5lE4zg9jIYvwom8GchnMoBc7e+lD1wcktp3o72yI8drqt
+K4ndEZMSs9Ul8XiLe9oNP7e7802x
clfs2vCa0sS8Ha6JWaziFIOTtTjGbXVt3OIis9dJhAIFezupWQ0LGLIItiyUbQzdGGP4uFPGsozx
W2QLzzjCxMUyfYE83BD3uLTJ5bFiwQtmFfc2/8if5PKa24xaL98WCzfZDzeWnN3rSpm0oe2xdpmb
Yy2DWbWORfR/w1xoRr8YtKZNdKQXPUo4s9nFjv4uYeEsZAUT
+cyfrutteLQTPtcXyBa2L38nnV/M
FnjGTA7wgff7Zko7mstynnV2/8xY1MpZlb
+OtYJ1Hdk2OxjCMq50lJFNZ2Ybm60iecraTN1aHRu2
xCgObqiJa932bjrD3za2pCcs2W1nudGc9qSlh7zodFuXuqvU7nIhHWhWmrmi/
LTRbagd2TTzmtCP
bjQnsx1rwSra4LY+94t3XO5JIxzTJ7a1kd
+tW4lXfNzAlveIp5tpvzJ1bQ3k94UFjm3nhjfUBP/3
dq/BvXJxN5zcD1d5phWuynV7ut3JVnli3V3tm0Ocxds2656iFFOR8za4qXb1que8626ru7/
GtTOk
pc5gdVOYwA1eZbAH/moAH5rYRz821iv83o0rHeJmNUpyNmJ0gZOcxgA/
c8UrG2fyChrtz7V2kOd9
d/k+O+CA77fJze5dnlNcvD/XtsW1uhLY9Kntcjcy2IOddeVe8PDcDrvVwU7rnAP45Ha/
st67jng0

Re: MemoryError on reading mbox file

2007-09-13 Thread Christoph Krammer
On 12 Sep., 16:39, Istvan Albert [EMAIL PROTECTED] wrote:
 This line reads an entire message into memory as a string. Is it
 possible that you have a huge email in there (hundreds of MB) with
 some attachment encoded as text?

No, the largest single message with the mbox is about 100KB large.


 For now I would recommend that you split your mbox file into several
 smaller ones. (I think all you need is to split at the To: fields) and
 run your script on these individual files.

I get it to work with splitting the mbox file into single files, one
for each message, with the git-mailsplit tool, that is included in the
gitk package. This solved the problem for now.

Thanks for all your help.

Christoph


-- 
http://mail.python.org/mailman/listinfo/python-list


MemoryError on reading mbox file

2007-09-12 Thread Christoph Krammer
Hello everybody,

I have to convert a huge mbox file (~1.5G) to MySQL.

I tried with the following simple code:

for m in mailbox.mbox(fileName):

  msg  = m.as_string(True)
  hash = md5.new(msg).hexdigest()

  try:
dbcurs.execute(INSERT INTO archive (hash, msg) VALUES (%s,
%s), (hash, msg))
  except MySQLdb.OperationalError, err:
print %s  Error (%d): %s % (file, err[0], err[1])
  else:
print %s: Message successfully added to database % (hash,
spamSource)

The problem seems to be the size of file, every time I try to execute
the script, after about 2 messages, the following error occurs:

Traceback (most recent call last):
  File email_to_mysql_mbox.py, line 21, in module
for m in mailbox.mbox(fileName):
  File /usr/lib/python2.5/mailbox.py, line 98, in itervalues
value = self[key]
  File /usr/lib/python2.5/mailbox.py, line 70, in __getitem__
return self.get_message(key)
  File /usr/lib/python2.5/mailbox.py, line 633, in get_message
string = self._file.read(stop - self._file.tell())
MemoryError

My system has 512M RAM and 768M swap, which seems to run out at an
early stage of this. Is there a way to clean up memory for messages
already processed?

Thanks and regards,
 Christoph

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: MemoryError on reading mbox file

2007-09-12 Thread Christoph Krammer
On 12 Sep., 12:20, David [EMAIL PROTECTED] wrote:
 It may be that Python's garbage collection isn't keeping up with your app.

 You could try periodically forcing it to run. eg:

 import gc
 gc.collect()

I tried this, but the problem is not solved. When invoking the garbage
collection after every loop run, the amount of memory indicated by top
stays the same for a very long time until at some point (at different
messages), while it is executing the loop header, the memory increases
until it hits 100% and swap hit also 100% = MemoryError

Can there be a problem within the mailbox module while processing too
large files?

Regards,
 Christoph


-- 
http://mail.python.org/mailman/listinfo/python-list


re.sub does not replace all occurences

2007-08-07 Thread Christoph Krammer
Hello everybody,

I wanted to use re.sub to strip all HTML tags out of a given string. I
learned that there are better ways to do this without the re module,
but I would like to know why my code is not working. I use the
following:

def stripHtml(source):
  source = re.sub([\n\r\f],  , source)
  source = re.sub(.*?, , source, re.S | re.I | re.M)
  source = re.sub((#[0-9]{1,3}|[a-z]{3,6});, , source, re.I)
  return source

But the result still has some tags in it. When I call the second line
multiple times, all tags disappear, but since HTML tags cannot be
overlapping, I do not understand this behavior. There is even a
difference when I omit the re.I (IGNORECASE) option. Without this
option, some tags containing only capital letters (like /FONT) were
kept in the string when doing one processing run but removed when
doing multiple runs.

Perhaps anyone can tell me why this regex is behaving like this.

Thanks and regards,
 Christoph

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: re.sub does not replace all occurences

2007-08-07 Thread Christoph Krammer
Neil Cerutti schrieb:
 In other words, the fourth argument to sub is count, not a set of
 re flags.

I knew it had to be something very stupid.

Thanks a lot.

-- 
http://mail.python.org/mailman/listinfo/python-list


Read binary data from MySQL database

2007-05-10 Thread Christoph Krammer
Hello,

I try to write a python application with wx that shows images from a
MySQL database. I use the following code to connect and get data when
some event was triggered:

dbconn = MySQLdb.connect(host=localhost, user=..., passwd=...,
db=images)
dbcurs = dbconn.cursor()
dbcurs.execute(SELECT imgdata FROM images LIMIT 1)
imgstring = dbcurs.fetchone()[0]
frame.showImage(imgstring)

Within my frame, the following method is defined:

def showImage(self, imgstring):
  imgdata = StringIO.StringIO()
  imgdata.write(imgstring)
  print imgdata.getvalue()
  wx.ImageFromStream(imgdata, wx.BITMAP_TYPE_GIF)
  panel = wx.Panel(self, -1)
  self.panel = panel

But this does not work. The converter says that the data is not valid
GIF. When I print the content of imgstring after the database select
statement, it contains something like this:

array('c', 'GIF89aL\x01=\x01\x85\x00\x00\x00\x00\x00\xff\xff\xff
\x00\xff\xff\xff[...]\x00\x00;')

When I try to print imgstring[1], the result is I. So I don't quite
get what this print result is about and why my input should not be
valid. The data in the database is correct, I can restore the image
with tools like the MySQL Query Browser.

Thanks in advance,
 Christoph

-- 
http://mail.python.org/mailman/listinfo/python-list


Broken pipe with os.popen3()

2007-04-10 Thread Christoph Krammer
Hello everybody,

I try to use an external OCR tool to convert some binary image data to
text. The image is in one variable, the text should be converted to
another. I use the following code:

  (si, so, se) = os.popen3('ocrad')
  si.write(frame)
  si.close()
  messagetext += so.read()

This code leads to a broken pipe error. I think this is because of the
command already writing data to stdout after getting the first part of
the input. But when I change the order of the code lines, i.e. opening
the reading pipe so before writing to si, the program hangs, because
no data is written to stdout before the first bytes are written to
stdin. Any idea how to solve this issue? How do I read and write
simultaneously?

Thanks in advance,
 Christoph

-- 
http://mail.python.org/mailman/listinfo/python-list


Using os.popen3() to get binary data

2007-04-06 Thread Christoph Krammer
Hello everybody,

I need to get the different frames from a GIF image in my python
script and want to use the giftopnm program from netpbm to get the
frames and directly convert them to pnm files. I tried to use the
following code:

for image in images:
if (image[0:3] == 'GIF'):
  (si, so, se) = os.popen3('giftopnm -image=all', 'b')
  si.write(image)
  frame = so.readlines()

But with this code the script just hangs. When I interrupt the script,
I get the following error message:
Traceback (most recent call last):
  File /home/tiger/stock-spam/scripts/all_in_one.py, line 46, in ?
frames = so.readlines()
KeyboardInterrupt
close failed: [Errno 32] Broken pipe

Can somebody tell me, which command I have to use that the pipe will
be closed when the giftopnm returns? This program just prints the
converted images to stdout and terminates.

Thanks in advance,
 Christoph

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using os.popen3() to get binary data

2007-04-06 Thread Christoph Krammer
Just got the solution...

After sending the image data with si.write(image), I have to close
the pipe to tell the program to convert the image with si.close().
Now everything works fine.

Christoph


-- 
http://mail.python.org/mailman/listinfo/python-list


How to access multiple group matches?

2007-04-06 Thread Christoph Krammer
Hello,

I want to use the re module to split a data stream that consists of
several blocks of data. I use the following code:

iter = re.finditer('^(HEADER\n.*)+$', data)

The data variable contains binary data that has the word HEADER in it
in some places and binary data after this word till the next
appearance of header or the end of the file. But if I iterate over
iter, I only get one match and this match only contains one group. How
to access the other matches? Data may contain tens of them.

Thanks in advance,
 Christoph

-- 
http://mail.python.org/mailman/listinfo/python-list