Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOM using PHP?

2005-05-03 Thread Jon M.
Thanks also, Richard! You were right on.

Just a note:

I just visited the W3C validation service again, and it seems they have 
recently updated it. It no longer complains if it finds a BOM in your 
document binary. So it would appear that it's no longer an issue with enough 
XML parsers to be relevant anymore. Still, it is nice to have a 
program -like I do- that has that flexibility.

-Jon


Richard Lynch [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 On Thu, April 28, 2005 4:14 am, Jon M. said:
 No matter what I do to the strings to encode them in whatever format
 before
 using fwrite, it ALWAYS seems to end up writing the actual file in
 iso-8859-1.

 How do you know?

 What are you using to determine the format of the file?

 We are contending that either you are *not* writing UTF-8 data, but are
 writing iso-8859-1 data, or the software telling you that it's not UTF-8
 is just plain *wrong*

 fwrite just takes your data and dumps it on the hard drive.

 It doesn't know UTF-8 from U2.

 Isn't the encoding of the characters in PHP's strings, and the encoding 
 of
 the actual binary file on your hard drive, two totally different things?
 Or
 am I just misinformed?

 You are mis-informed.

 How do you actually control the way the binary file itself is written, 
 and
 not just the text that is saved in the file?

 If you are using Windows, then *WINDOWS* is, perhaps, guessing on the
 binary format based on the file 'extension' (.txt) and on the contents.

 First, try renaming the file to, err, whatever Windows thinks UTF-8 file
 extensions should be... .utf8 ??? Whatever Notepad uses.

 Next, forget what Windows desktop tells you.  It's bull.

 When you get the data back out of the file, what format is it?

 PS You may be confusing Windows by writing UTF-8 without the BOM, and so
 Windows then thinksit's iso-8859-1, because it's no longer a valid UTF-8
 file!  You can make Windows happy; or you can make W3c happy.  Not both.

 -- 
 Like Music?
 http://l-i-e.com/artists.htm 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOM using PHP?

2005-04-29 Thread Rasmus Lerdorf
Jon M. wrote:
No matter what I do to the strings to encode them in whatever format before 
using fwrite, it ALWAYS seems to end up writing the actual file in 
iso-8859-1.

Isn't the encoding of the characters in PHP's strings, and the encoding of 
the actual binary file on your hard drive, two totally different things? Or 
am I just misinformed?
A file is completely defined by its contents.  If you fwrite UTF-8 data 
to the file, then it is a UTF-8 file.  Whether your editor, or whatever 
it is you are using to determine the file is being written as iso-8859-1 
is smart enough to pick this up is a completely different question.

Why don't you try creating the same contents with PHP and with your 
preferred text editor and then compare the contents.  Perhaps your 
editor is dropping a hint somewhere in it that you are not writing to 
the file from PHP.

-Rasmus
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOM using PHP?

2005-04-28 Thread Jon M.
Wow, nice to hear from the guy who created PHP! :)



I need to make 2 things clearer though:



1. First I want to know how to make the actual binary file that is written, 
to be encoded as UTF-8. This is my PRIMARY question.



2. How to have fwrite write the file without the BOM. (to make it more 
compatible with certain XML parsers -per the W3C's recommendation).



No matter what I do to the strings to encode them in whatever format before 
using fwrite, it ALWAYS seems to end up writing the actual file in 
iso-8859-1.

Isn't the encoding of the characters in PHP's strings, and the encoding of 
the actual binary file on your hard drive, two totally different things? Or 
am I just misinformed?



Example: When I open up Windows Notepad, then I type some stuff into it, 
and then I choose file, save as, encoding: UTF-8, then I click save.



So how do I do this SAME thing using PHP?



Could someone give me a actual code example of how to do that? I'm just 
s lost.



How do you actually control the way the binary file itself is written, and 
not just the text that is saved in the file?




Rasmus Lerdorf [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Jon M. wrote:
 Richard Lynch [EMAIL PROTECTED] wrote in message 
 news:[EMAIL PROTECTED]

On Thu, April 21, 2005 5:07 pm, Jon M. said:

I am trying to have a file that I generated with PHP saved as UTF-8
without
the BOM (Byte Order Mark). Does PHP do anything like this? I am a 
beginner
with PHP, but very technically experienced otherwise. I'm talking about
the
FILE encoding here -just to be clear.

e.g.  fopen(what_ever_file, a+) now I want PHP save the file itself
with
UTF-8, NOT system default.

I have searched for hours to find an answer, but have not found any info
on
the subject.

Does PHP have any ability to create a text file saved in UTF-8 
encoding???

Maybe I'm just being dumb, but I think if you UTF-8 encode your data, and
http://php.net/fwrite it, you're gonna get what you want...

Dunno about the Byte-Order-Mark part, but I guess you could strip it out
of the UTF-8 encoded data before writing, if you wanted to.

-- 
Like Music?
http://l-i-e.com/artists.htm



 That was the first thing I tried, and it doesn't seem to work (it always 
 saves in windows default encoding). Unless I missing something about what 
 you can do with fwrite. Did you actually test that before you replied, 
 and found that it would? If so, how? The Byte Order Mark is a part of the 
 binary file that is written, how would one go about stripping it out??

 BTW, note to PHP developers: If fwrite had a encoding parameter, e.g. 
 UTF-8, that would be REALLY handy.

 Strings in PHP are binary-safe and character-encoding neutral.  fwrite 
 doesn't have a clue what it is writing, it just writes what is in memory.

 I'd question why you would want to strip the BOM.  Any modern system deals 
 with the byte-order-mark correctly.  But you can simply strip it manually 
 if it is present in the first 2 bytes before your fwrite if you really 
 need to get rid of it.

 -Rasmus 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOM using PHP?

2005-04-28 Thread Jon Hill
 1. First I want to know how to make the actual binary file that is written,
 to be encoded as UTF-8. This is my PRIMARY question.
I think this part is the key. If the string originated from your text editor, 
then your text editor needs to be in UTF-8 mode. If you are importing the 
data from a file or database then you need to be sure that the data is UTF-8  
encoded. If you know that the encoding is instead ISO-8859-1 then you can 
convert it to UTF-8 with the php function utf_encode. fwrite will then write 
out the bytes that you presented to it. I guess the same applies in reverse 
when you read the file back in to the application that will view the text.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Is it possible to save a file with UTF-8 encoding and noBOM using PHP?

2005-04-28 Thread Richard Lynch
On Thu, April 28, 2005 4:14 am, Jon M. said:
 No matter what I do to the strings to encode them in whatever format
 before
 using fwrite, it ALWAYS seems to end up writing the actual file in
 iso-8859-1.

How do you know?

What are you using to determine the format of the file?

We are contending that either you are *not* writing UTF-8 data, but are
writing iso-8859-1 data, or the software telling you that it's not UTF-8
is just plain *wrong*

fwrite just takes your data and dumps it on the hard drive.

It doesn't know UTF-8 from U2.

 Isn't the encoding of the characters in PHP's strings, and the encoding of
 the actual binary file on your hard drive, two totally different things?
 Or
 am I just misinformed?

You are mis-informed.

 How do you actually control the way the binary file itself is written, and
 not just the text that is saved in the file?

If you are using Windows, then *WINDOWS* is, perhaps, guessing on the
binary format based on the file 'extension' (.txt) and on the contents.

First, try renaming the file to, err, whatever Windows thinks UTF-8 file
extensions should be... .utf8 ??? Whatever Notepad uses.

Next, forget what Windows desktop tells you.  It's bull.

When you get the data back out of the file, what format is it?

PS You may be confusing Windows by writing UTF-8 without the BOM, and so
Windows then thinksit's iso-8859-1, because it's no longer a valid UTF-8
file!  You can make Windows happy; or you can make W3c happy.  Not both.

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php