I trying to process a file that was originally created on an AS/400 as a spooled report. The file has been converted to ASCII before sending to me by e-mail. The original report is in Arabic script and so any Arabic script has been mapped to

I can’t read the whole file in unless I chop out all the (formerly) Arabic characters as read(), readline() or readlines() seems to think its done too early. The problem appears to be that the conversion has produced a byte with hex value 1a and Python is treating this as an end-of-file marker. This I’ve worked this out by using a Hex Editor and looking at the character after where the read operation stops.  The offending character the square (unprintable) character in the file snippet below.

Start file snippet >>

MK    2005/01/10 البنك العربي(ش .م.ع)        الميزانية الموحدة - تقريـر الميزانية الشهــرية                              كما هي في

              01 : فروع دولة امارات                =========================================                              الصـفحة 

<< End file snippet

Is there a way I can pre-process this file with Python and chop out the characters ( the 1a) I don’t want?

 

If I do this:

import string

report = open('d:\\Software\\PythonScripts\\ear11050110.txt').readlines()               

report is:

>>> report

['MK    2005/01/10 \xc7\xe1\xc8\xe4\xdf \xc7\xe1\xda\xd1\xc8\xed(\xd4 .\xe3.\xda)        \xc7\xe1\xe3\xed\xd2\xc7\xe4\xed\xc9 \xc7\xe1\xe3\xe6\xcd\xcf\xc9 - \xca\xde\xd1\xed\xdc\xd1 \xc7\xe1\xe3\xed\xd2\xc7\xe4\xed\xc9 \xc7\xe1\xd4\xe5\xdc\xdc\xd1\xed\xc9                              \xdf\xe3\xc7 \xe5\xed \xdd\xed\n', '              01 : \xdd\xd1\xe6\xda \xcf\xe6\xe1\xc9 \xc7']

 

Which is everything up to the hex 1a.

 

Thanks for any prompting whatsoever.

 

Nick.

 



**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
Information Technology International (ITI) +44 (0)20 7315 8500
**********************************************************************
_______________________________________________
Python-win32 mailing list
Python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32

Reply via email to