On Tue, Dec 11, 2001 at 11:34:09AM -0800, Brian Stell wrote: > Jalal, > > Kindly reply via the mailing list so others can see the discussion. > That way others can benefit and/or help. > > BOM is the Byte Order Mark used in Unicode to indicate an > important detail about the Unicode data stream. > > Perhaps the Perl people can describe how to inhibit the BOM?
I don't think it's Perl putting the BOM in there. I opened up Notepad in Win2000, wrote "foobar", and saved the file as "ANSI", "UTF-8", "Unicode", and "Unicode big endian". Then in UNIX with this perl -e 'print "$ARGV[0]: "; print unpack "H*", <>; print "\n"' file.name I get foo.ansi: feff0066006f006f006200610072000d000a foo.utf8: efbbbf666f6f6261720d0a foo.unic: fffe66006f006f006200610072000d000a foo.unib: feff0066006f006f006200610072000d000a (copied by hand, so typos possible) which looks like little-endian UTF-16, UTF-8, big-endian UTF-16, and (again) little-endian UTF-16 to me. For example the "Unicode" is first the BOM, then the 0x66 aka "f", then two 0x6f:s, aka "o", then 0x62, aka "b", and so on. No Perl was involved in creating these files, but the BOMs are there (the UTF-8 0xEF 0xBB 0xBF is the BOMin disguise). Moreover, if the browser claims to do Unicode, it should recognize the BOM, too, and ignore it in display (but of course use it to figure out the right endianness). -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack Cohen