Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOM using PHP?
Thanks also, Richard! You were right on. Just a note: I just visited the W3C validation service again, and it seems they have recently updated it. It no longer complains if it finds a BOM in your document binary. So it would appear that it's no longer an issue with enough XML parsers to be relevant anymore. Still, it is nice to have a program -like I do- that has that flexibility. -Jon Richard Lynch [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Thu, April 28, 2005 4:14 am, Jon M. said: No matter what I do to the strings to encode them in whatever format before using fwrite, it ALWAYS seems to end up writing the actual file in iso-8859-1. How do you know? What are you using to determine the format of the file? We are contending that either you are *not* writing UTF-8 data, but are writing iso-8859-1 data, or the software telling you that it's not UTF-8 is just plain *wrong* fwrite just takes your data and dumps it on the hard drive. It doesn't know UTF-8 from U2. Isn't the encoding of the characters in PHP's strings, and the encoding of the actual binary file on your hard drive, two totally different things? Or am I just misinformed? You are mis-informed. How do you actually control the way the binary file itself is written, and not just the text that is saved in the file? If you are using Windows, then *WINDOWS* is, perhaps, guessing on the binary format based on the file 'extension' (.txt) and on the contents. First, try renaming the file to, err, whatever Windows thinks UTF-8 file extensions should be... .utf8 ??? Whatever Notepad uses. Next, forget what Windows desktop tells you. It's bull. When you get the data back out of the file, what format is it? PS You may be confusing Windows by writing UTF-8 without the BOM, and so Windows then thinksit's iso-8859-1, because it's no longer a valid UTF-8 file! You can make Windows happy; or you can make W3c happy. Not both. -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOM using PHP?
Jon M. wrote: No matter what I do to the strings to encode them in whatever format before using fwrite, it ALWAYS seems to end up writing the actual file in iso-8859-1. Isn't the encoding of the characters in PHP's strings, and the encoding of the actual binary file on your hard drive, two totally different things? Or am I just misinformed? A file is completely defined by its contents. If you fwrite UTF-8 data to the file, then it is a UTF-8 file. Whether your editor, or whatever it is you are using to determine the file is being written as iso-8859-1 is smart enough to pick this up is a completely different question. Why don't you try creating the same contents with PHP and with your preferred text editor and then compare the contents. Perhaps your editor is dropping a hint somewhere in it that you are not writing to the file from PHP. -Rasmus -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOM using PHP?
Wow, nice to hear from the guy who created PHP! :) I need to make 2 things clearer though: 1. First I want to know how to make the actual binary file that is written, to be encoded as UTF-8. This is my PRIMARY question. 2. How to have fwrite write the file without the BOM. (to make it more compatible with certain XML parsers -per the W3C's recommendation). No matter what I do to the strings to encode them in whatever format before using fwrite, it ALWAYS seems to end up writing the actual file in iso-8859-1. Isn't the encoding of the characters in PHP's strings, and the encoding of the actual binary file on your hard drive, two totally different things? Or am I just misinformed? Example: When I open up Windows Notepad, then I type some stuff into it, and then I choose file, save as, encoding: UTF-8, then I click save. So how do I do this SAME thing using PHP? Could someone give me a actual code example of how to do that? I'm just s lost. How do you actually control the way the binary file itself is written, and not just the text that is saved in the file? Rasmus Lerdorf [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Jon M. wrote: Richard Lynch [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Thu, April 21, 2005 5:07 pm, Jon M. said: I am trying to have a file that I generated with PHP saved as UTF-8 without the BOM (Byte Order Mark). Does PHP do anything like this? I am a beginner with PHP, but very technically experienced otherwise. I'm talking about the FILE encoding here -just to be clear. e.g. fopen(what_ever_file, a+) now I want PHP save the file itself with UTF-8, NOT system default. I have searched for hours to find an answer, but have not found any info on the subject. Does PHP have any ability to create a text file saved in UTF-8 encoding??? Maybe I'm just being dumb, but I think if you UTF-8 encode your data, and http://php.net/fwrite it, you're gonna get what you want... Dunno about the Byte-Order-Mark part, but I guess you could strip it out of the UTF-8 encoded data before writing, if you wanted to. -- Like Music? http://l-i-e.com/artists.htm That was the first thing I tried, and it doesn't seem to work (it always saves in windows default encoding). Unless I missing something about what you can do with fwrite. Did you actually test that before you replied, and found that it would? If so, how? The Byte Order Mark is a part of the binary file that is written, how would one go about stripping it out?? BTW, note to PHP developers: If fwrite had a encoding parameter, e.g. UTF-8, that would be REALLY handy. Strings in PHP are binary-safe and character-encoding neutral. fwrite doesn't have a clue what it is writing, it just writes what is in memory. I'd question why you would want to strip the BOM. Any modern system deals with the byte-order-mark correctly. But you can simply strip it manually if it is present in the first 2 bytes before your fwrite if you really need to get rid of it. -Rasmus -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOM using PHP?
1. First I want to know how to make the actual binary file that is written, to be encoded as UTF-8. This is my PRIMARY question. I think this part is the key. If the string originated from your text editor, then your text editor needs to be in UTF-8 mode. If you are importing the data from a file or database then you need to be sure that the data is UTF-8 encoded. If you know that the encoding is instead ISO-8859-1 then you can convert it to UTF-8 with the php function utf_encode. fwrite will then write out the bytes that you presented to it. I guess the same applies in reverse when you read the file back in to the application that will view the text. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOM using PHP?
On Thu, April 28, 2005 4:14 am, Jon M. said: No matter what I do to the strings to encode them in whatever format before using fwrite, it ALWAYS seems to end up writing the actual file in iso-8859-1. How do you know? What are you using to determine the format of the file? We are contending that either you are *not* writing UTF-8 data, but are writing iso-8859-1 data, or the software telling you that it's not UTF-8 is just plain *wrong* fwrite just takes your data and dumps it on the hard drive. It doesn't know UTF-8 from U2. Isn't the encoding of the characters in PHP's strings, and the encoding of the actual binary file on your hard drive, two totally different things? Or am I just misinformed? You are mis-informed. How do you actually control the way the binary file itself is written, and not just the text that is saved in the file? If you are using Windows, then *WINDOWS* is, perhaps, guessing on the binary format based on the file 'extension' (.txt) and on the contents. First, try renaming the file to, err, whatever Windows thinks UTF-8 file extensions should be... .utf8 ??? Whatever Notepad uses. Next, forget what Windows desktop tells you. It's bull. When you get the data back out of the file, what format is it? PS You may be confusing Windows by writing UTF-8 without the BOM, and so Windows then thinksit's iso-8859-1, because it's no longer a valid UTF-8 file! You can make Windows happy; or you can make W3c happy. Not both. -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php