Re: Converting text file to different encoding.

Dave Angel Fri, 17 Apr 2015 07:51:11 -0700

On 04/17/2015 09:19 AM, subhabrata.bane...@gmail.com wrote:

I am having few files in default encoding. I wanted to change their encodings,
preferably in "UTF-8", or may be from one encoding to any other encoding.

You neglected to specify what Python version this is for. Otherinformation that'd be useful is whether the file size is small enoughthat two copies of it will all fit reasonably into memory.

I'll assume it's version 2.7, because of various clues in your samplecode. But if it's version 3.x, it could be substantially easier.

I was trying it as follows,

    >>> import codecs
    >>> sourceEncoding = "iso-8859-1"
    >>> targetEncoding = "utf-8"
    >>> source = open("source1","w")

mode "w" will truncate the source1 file, leaving you nothing to process.i'd suggest "r"

    >>> target = open("target", "w")

It's not usually a good idea to use the same variable for both the filename and the opened file object. What if you need later to print thename, as in an error message?

    >>> target.write(unicode(source, sourceEncoding).encode(targetEncoding))

I'd not recommend trying to do so much in one line, at least until youunderstand all the pieces. Programming is not (usually) a contest towrite the most obscure code, but rather to make a program you can stillread and understand six months from now. And, oh yeah, something thatwill run and accomplish something.


>
> but it was giving me error as follows,
> Traceback (most recent call last):
>    File "<pyshell#6>", line 1, in <module>
>      target.write(unicode(source, sourceEncoding).encode(targetEncoding))
> TypeError: coercing to Unicode: need string or buffer, file found

if you factor this you will discover your error. Nowhere do you readthe source file into a byte string. And that's what is needed for theunicode constructor. Factored, you might have something like:


     encodedtext = source.read()
     text = unicode(source, sourceEncoding)
     reencodedtext = text.encode(targetEncoding)
     target.write(encodedText)

Next, you need to close the files.

    source.close()
    target.close()

There are a number of ways to improve that code, but this is a start.

Improvements:

Use codecs.open() to open the files, so encoding is handledimplicitly in the file objects.


     Use with... syntax so that the file closes are implicit

read and write the files in a loop, a line at a time, so that youneedn't have all the data in memory (at least twice) at one time. Thiswill also help enormously if you encounter any errors, and want toreport the location and problem to the user. It might even turn out tobe faster.

You should write non-trivial code in a text file, and run it fromthere.


--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Re: Converting text file to different encoding.

Reply via email to