On 04/17/2015 09:19 AM, subhabrata.bane...@gmail.com wrote:
I am having few files in default encoding. I wanted to change their encodings,
preferably in "UTF-8", or may be from one encoding to any other encoding.


You neglected to specify what Python version this is for. Other information that'd be useful is whether the file size is small enough that two copies of it will all fit reasonably into memory.

I'll assume it's version 2.7, because of various clues in your sample code. But if it's version 3.x, it could be substantially easier.

I was trying it as follows,

    >>> import codecs
    >>> sourceEncoding = "iso-8859-1"
    >>> targetEncoding = "utf-8"
    >>> source = open("source1","w")

mode "w" will truncate the source1 file, leaving you nothing to process. i'd suggest "r"

    >>> target = open("target", "w")

It's not usually a good idea to use the same variable for both the file name and the opened file object. What if you need later to print the name, as in an error message?

    >>> target.write(unicode(source, sourceEncoding).encode(targetEncoding))

I'd not recommend trying to do so much in one line, at least until you understand all the pieces. Programming is not (usually) a contest to write the most obscure code, but rather to make a program you can still read and understand six months from now. And, oh yeah, something that will run and accomplish something.

>
> but it was giving me error as follows,
> Traceback (most recent call last):
>    File "<pyshell#6>", line 1, in <module>
>      target.write(unicode(source, sourceEncoding).encode(targetEncoding))
> TypeError: coercing to Unicode: need string or buffer, file found


if you factor this you will discover your error. Nowhere do you read the source file into a byte string. And that's what is needed for the unicode constructor. Factored, you might have something like:

     encodedtext = source.read()
     text = unicode(source, sourceEncoding)
     reencodedtext = text.encode(targetEncoding)
     target.write(encodedText)

Next, you need to close the files.

    source.close()
    target.close()

There are a number of ways to improve that code, but this is a start.

Improvements:

Use codecs.open() to open the files, so encoding is handled implicitly in the file objects.

     Use with... syntax so that the file closes are implicit

read and write the files in a loop, a line at a time, so that you needn't have all the data in memory (at least twice) at one time. This will also help enormously if you encounter any errors, and want to report the location and problem to the user. It might even turn out to be faster.

You should write non-trivial code in a text file, and run it from there.

--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to