Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
On Mar 30 13:48, Michael Moser wrote: I need to mangle a file containing 8-bit ASCII characters (i.e. the file contains also characters in the upper 8-bit range, namely a few umlauts as well as some french accented characters). Strange enough, the SED version that came as part of cygwin emits the result of the mangling using 16-bit characters (I believe those are Unicode-16 characters, but not sure. The Hexeditor shows each second byte as always 00, execpt for the first two bytes which read FF FE). This is very likely not Cygwin's sed. Do you have another sed in $PATH by any chance? I tried with input files containing german umlauts and sed does not convert to wide char and it does not produce a BOM marker at the start of the file. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
Corinna Vinschen wrote: On Mar 30 13:48, Michael Moser wrote: I need to mangle a file containing 8-bit ASCII characters (i.e. the file contains also characters in the upper 8-bit range, namely a few umlauts as well as some french accented characters). Strange enough, the SED version that came as part of cygwin emits the result of the mangling using 16-bit characters (I believe those are Unicode-16 characters, but not sure. The Hexeditor shows each second byte as always 00, execpt for the first two bytes which read FF FE). This is very likely not Cygwin's sed. Do you have another sed in $PATH by any chance? I tried with input files containing german umlauts and sed does not convert to wide char and it does not produce a BOM marker at the start of the file. Another possibility is that wordpad or notepad has tried to be clever and gone and unexpectedly saved the original source file in UTF16. Did you verify the original source file in a hexeditor too, Michael? cheers, DaveK -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
Date: Mon, 30 Mar 2009 14:10:43 +0200 From: corinna-cyg...@cygwin.com To: cygwin@cygwin.com Subject: Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that? On Mar 30 13:48, Michael Moser wrote: I need to mangle a file containing 8-bit ASCII characters (i.e. the file contains also characters in the upper 8-bit range, namely a few umlauts as well as some french accented characters). Strange enough, the SED version that came as part of cygwin emits the result of the mangling using 16-bit characters (I believe those are Unicode-16 characters, but not sure. The Hexeditor shows each second byte as always 00, execpt for the first two bytes which read FF FE). This is very likely not Cygwin's sed. Do you have another sed in $PATH by any chance? I tried with input files containing german umlauts and sed does not convert to wide char and it does not produce a BOM marker at the start of the file. On a related note, sometimes which gives or gave confusing results if you don't have the relevant x permissions set ( chmod 777 fixes everything LOL ) I know I've never seen this problem with sed and in fact use it with no special options to replace 0x00 in unicode files and it works fine ( apparently the windoze registry is unicode and if you ever need a set of verbose, highly redundant strings it is a good place to look ). Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/ _ Quick access to Windows Live and your favorite MSN content with Internet Explorer 8. http://ie8.msn.com/microsoft/internet-explorer-8/en-us/ie8.aspx?ocid=B037MSN55C0701A -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
-Original Message- From: cygwin-ow...@cygwin.com [mailto:cygwin-ow...@cygwin.com] On Behalf Of Corinna Vinschen Sent: Montag, 30. März 2009 14:11 To: cygwin@cygwin.com Subject: Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that? ... This is very likely not Cygwin's sed. Do you have another sed in $PATH by any chance? I searched and did find another sed on my disk, but that was not in the path. So - yes - I *am* using cycwin's sed. I tried with input files containing german umlauts and sed does not convert to wide char and it does not produce a BOM marker at the start of the file. Maybe that conversion comes from me redirecting the output to a file using 'sed {options} filename.ext' ?!? I'll have to verify that! There is no option to explicitly specify an output file, is there? Michael -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
-Original Message- From: cygwin-ow...@cygwin.com [mailto:cygwin-ow...@cygwin.com] On Behalf Of Dave Korn Sent: Montag, 30. März 2009 14:46 To: cygwin@cygwin.com Subject: Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that? ... Another possibility is that wordpad or notepad has tried to be clever and gone and unexpectedly saved the original source file in UTF16. Did you verify the original source file in a hexeditor too, Michael? Yes - I did. The input is stricly one byte/octet per character and starts with 4e 61 6d 65 09 ... (= NameTAB...) The output starts with: ff fe 4e 00 61 00 6d 00 65 00 09 00 ... Michael -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
From: michael.mo...@sunrise.ch To: cygwin@cygwin.com Subject: RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that? Date: Mon, 30 Mar 2009 17:56:58 +0200 -Original Message- From: cygwin-ow...@cygwin.com [mailto:cygwin-ow...@cygwin.com] On Behalf Of Dave Korn Sent: Montag, 30. März 2009 14:46 To: cygwin@cygwin.com Subject: Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that? ... Another possibility is that wordpad or notepad has tried to be clever and gone and unexpectedly saved the original source file in UTF16. Did you verify the original source file in a hexeditor too, Michael? Yes - I did. The input is stricly one byte/octet per character and starts with 4e 61 6d 65 09 ... (= Name...) The output starts with: ff fe 4e 00 61 00 6d 00 65 00 09 00 ... try sed --version Michael -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/ _ Express your personality in color! Preview and select themes for Hotmail®. http://www.windowslive-hotmail.com/LearnMore/personalize.aspx?ocid=TXT_MSGTX_WL_HM_express_032009#colortheme -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
Michael Moser wrote: -Original Message- BTW please trim the redundant headers... it's really considerate not to post people's email addresses in the body of your post because if you do so they get harvested by spammers. I tried with input files containing german umlauts and sed does not convert to wide char and it does not produce a BOM marker at the start of the file. Maybe that conversion comes from me redirecting the output to a file using 'sed {options} filename.ext' ?!? I'll have to verify that! What terminal/console are you using? Unicode RXVT by any chance? How does it behave in other consoles? You can edit a file in-place by using the -i option to sed. cheers, DaveK -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
-Original Message- From: cygwin-ow...@cygwin.com [mailto:cygwin-ow...@cygwin.com] On Behalf Of Dave Korn Sent: Montag, 30. März 2009 23:02 To: cygwin@cygwin.com Subject: Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that? BTW please trim the redundant headers... it's really considerate not to post people's email addresses in the body of your post because if you do so they get harvested by spammers. Ooops - apologies! Thought I routinely did so, but must have overlooked it this time. What terminal/console are you using? Unicode RXVT by any chance? How does it behave in other consoles? THAT rang the bell! I am using PowerShell - as I just double-checked - that indeed operates in Unicode-16 unless told otherwise. And it came into play here, because I had simply redirected sed's output to a file. Nothing to do with sed at all! Apologies for the wasted bandwidth! Michael -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
On Mon, Mar 30, 2009 at 11:20:00PM +0200, Michael Moser wrote: -Original Message- From: [mailto: Sent: To: Subject: BTW please trim the redundant headers... it's really considerate not to post people's email addresses in the body of your post because if you do so they get harvested by spammers. Ooops - apologies! Thought I routinely did so, but must have overlooked it this time. FYI, you did it above too. None of the above information belongs in the body of your message. cgf -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
-Original Message- Ooops - apologies! Thought I routinely did so, but must have overlooked it this time. FYI, you did it above too. None of the above information belongs in the body of your message. What - you get someone's email addresses there? Very odd: I swear I only see the mailing list as the one and only address in the To: field (and no address in my Cc and Bcc fields) and I also left no email address in those echoed headers. The only address I left in was in From:cygwin@cygwin.com, which IMHO - if some harvester has made it into this newsgroup - he already knows anyway... Or do you see any further email addresses included again? If so: very strange - how come? Michael -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
Michael Moser wrote: -Original Message- Ooops - apologies! Thought I routinely did so, but must have overlooked it this time. FYI, you did it above too. None of the above information belongs in the body of your message. What - you get someone's email addresses there? Very odd: I swear I only see the mailing list as the one and only address in the To: field (and no address in my Cc and Bcc fields) and I also left no email address in those echoed headers. The only address I left in was in From:cygwin@cygwin.com, which IMHO - if some harvester has made it into this newsgroup - he already knows anyway... Take a look at http://cygwin.com/ml/cygwin/2009-03/msg01079.html and it should be clear what Chris is talking about. There's really no need to post any headers at all, regardless of how commonplace you think the information would be. They really provide no additional information and can be used for purposes you don't intend. Besides being good netiquette, this lists appreciates tidy messages as well. :-) -- Larry Hall http://www.rfk.com RFK Partners, Inc. (508) 893-9779 - RFK Office 216 Dalton Rd. (508) 893-9889 - FAX Holliston, MA 01746 _ A: Yes. Q: Are you sure? A: Because it reverses the logical flow of conversation. Q: Why is top posting annoying in email? -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/