Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

2009-03-30 Thread Corinna Vinschen
On Mar 30 13:48, Michael Moser wrote:
 I need to mangle a file containing 8-bit ASCII characters (i.e. the
 file contains also characters in the upper 8-bit range, namely a few
 umlauts as well as some french accented characters). 
 
 Strange enough, the SED version that came as part of cygwin emits the
 result of the mangling using 16-bit characters (I believe those are
 Unicode-16 characters, but not sure. The Hexeditor shows each second
 byte as always 00, execpt for the first two bytes which read FF FE).

This is very likely not Cygwin's sed.  Do you have another sed in $PATH
by any chance?  I tried with input files containing german umlauts and
sed does not convert to wide char and it does not produce a BOM marker
at the start of the file.


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

2009-03-30 Thread Dave Korn
Corinna Vinschen wrote:
 On Mar 30 13:48, Michael Moser wrote:
 I need to mangle a file containing 8-bit ASCII characters (i.e. the
 file contains also characters in the upper 8-bit range, namely a few
 umlauts as well as some french accented characters). 

 Strange enough, the SED version that came as part of cygwin emits the
 result of the mangling using 16-bit characters (I believe those are
 Unicode-16 characters, but not sure. The Hexeditor shows each second
 byte as always 00, execpt for the first two bytes which read FF FE).
 
 This is very likely not Cygwin's sed.  Do you have another sed in $PATH
 by any chance?  I tried with input files containing german umlauts and
 sed does not convert to wide char and it does not produce a BOM marker
 at the start of the file.

  Another possibility is that wordpad or notepad has tried to be clever and
gone and unexpectedly saved the original source file in UTF16.  Did you verify
the original source file in a hexeditor too, Michael?

cheers,
  DaveK

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

2009-03-30 Thread Mike Marchywka







 Date: Mon, 30 Mar 2009 14:10:43 +0200
 From: corinna-cyg...@cygwin.com
 To: cygwin@cygwin.com
 Subject: Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters 
 - how to suppress that?

 On Mar 30 13:48, Michael Moser wrote:
 I need to mangle a file containing 8-bit ASCII characters (i.e. the
 file contains also characters in the upper 8-bit range, namely a few
 umlauts as well as some french accented characters).

 Strange enough, the SED version that came as part of cygwin emits the
 result of the mangling using 16-bit characters (I believe those are
 Unicode-16 characters, but not sure. The Hexeditor shows each second
 byte as always 00, execpt for the first two bytes which read FF FE).

 This is very likely not Cygwin's sed. Do you have another sed in $PATH
 by any chance? I tried with input files containing german umlauts and
 sed does not convert to wide char and it does not produce a BOM marker
 at the start of the file.


On a related note, sometimes which gives or gave confusing results if you 
don't have the relevant x permissions set 
( chmod 777 fixes everything LOL  )

I know I've never seen this problem with sed and in fact
use it with no special options to replace 0x00 in unicode
files and it works fine ( apparently the windoze registry
is unicode and if you ever need a set of verbose, highly
redundant strings it is a good place to look ).



 Corinna

 --
 Corinna Vinschen Please, send mails regarding Cygwin to
 Cygwin Project Co-Leader cygwin AT cygwin DOT com
 Red Hat

 --
 Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
 Problem reports: http://cygwin.com/problems.html
 Documentation: http://cygwin.com/docs.html
 FAQ: http://cygwin.com/faq/


_
Quick access to Windows Live and your favorite MSN content with Internet 
Explorer 8.
http://ie8.msn.com/microsoft/internet-explorer-8/en-us/ie8.aspx?ocid=B037MSN55C0701A

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

2009-03-30 Thread Michael Moser


 -Original Message-
 From: cygwin-ow...@cygwin.com 
 [mailto:cygwin-ow...@cygwin.com] On Behalf Of Corinna Vinschen
 Sent: Montag, 30. März 2009 14:11
 To: cygwin@cygwin.com
 Subject: Re: sed converts 8-bit input text to 16-bit 
 (Unicode-16?) characters - how to suppress that?
  ... 
 This is very likely not Cygwin's sed.  Do you have another 
 sed in $PATH by any chance? 

I searched and did find another sed on my disk, but that was not in
the path. So - yes - I *am* using cycwin's sed.

 I tried with input files 
 containing german umlauts and sed does not convert to wide 
 char and it does not produce a BOM marker at the start of the file.

Maybe that conversion comes from me redirecting the output to a file
using 
'sed {options}  filename.ext'  ?!? I'll have to verify that!

There is no option to explicitly specify an output file, is there?

Michael



--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

2009-03-30 Thread Michael Moser

 -Original Message-
 From: cygwin-ow...@cygwin.com 
 [mailto:cygwin-ow...@cygwin.com] On Behalf Of Dave Korn
 Sent: Montag, 30. März 2009 14:46
 To: cygwin@cygwin.com
 Subject: Re: sed converts 8-bit input text to 16-bit 
 (Unicode-16?) characters - how to suppress that?
  ...
 Another possibility is that wordpad or notepad has tried to 
 be clever and gone and unexpectedly saved the original source 
 file in UTF16.  Did you verify the original source file in a 
 hexeditor too, Michael?

Yes - I did. The input is stricly one byte/octet per character and
starts with 
4e 61 6d 65 09 ...  (= NameTAB...)

The output starts with:
ff fe 4e 00 61 00 6d 00 65 00 09 00 ... 


Michael


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

2009-03-30 Thread Mike Marchywka



 From: michael.mo...@sunrise.ch
 To: cygwin@cygwin.com
 Subject: RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters 
 - how to suppress that?
 Date: Mon, 30 Mar 2009 17:56:58 +0200


 -Original Message-
 From: cygwin-ow...@cygwin.com
 [mailto:cygwin-ow...@cygwin.com] On Behalf Of Dave Korn
 Sent: Montag, 30. März 2009 14:46
 To: cygwin@cygwin.com
 Subject: Re: sed converts 8-bit input text to 16-bit
 (Unicode-16?) characters - how to suppress that?
 ...
 Another possibility is that wordpad or notepad has tried to
 be clever and gone and unexpectedly saved the original source
 file in UTF16. Did you verify the original source file in a
 hexeditor too, Michael?

 Yes - I did. The input is stricly one byte/octet per character and
 starts with
 4e 61 6d 65 09 ... (= Name...)

 The output starts with:
 ff fe 4e 00 61 00 6d 00 65 00 09 00 ... 


try  sed --version 



 Michael


 --
 Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
 Problem reports: http://cygwin.com/problems.html
 Documentation: http://cygwin.com/docs.html
 FAQ: http://cygwin.com/faq/


_
Express your personality in color! Preview and select themes for Hotmail®.
http://www.windowslive-hotmail.com/LearnMore/personalize.aspx?ocid=TXT_MSGTX_WL_HM_express_032009#colortheme

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

2009-03-30 Thread Dave Korn
Michael Moser wrote:
 
 -Original Message-

  BTW please trim the redundant headers... it's really considerate not to post
people's email addresses in the body of your post because if you do so they
get harvested by spammers.

 I tried with input files 
 containing german umlauts and sed does not convert to wide 
 char and it does not produce a BOM marker at the start of the file.
 
 Maybe that conversion comes from me redirecting the output to a file
 using 
 'sed {options}  filename.ext'  ?!? I'll have to verify that!

  What terminal/console are you using?  Unicode RXVT by any chance?  How does
it behave in other consoles?

  You can edit a file in-place by using the -i option to sed.

cheers,
  DaveK

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

2009-03-30 Thread Michael Moser

 -Original Message-
 From: cygwin-ow...@cygwin.com 
 [mailto:cygwin-ow...@cygwin.com] On Behalf Of Dave Korn
 Sent: Montag, 30. März 2009 23:02
 To: cygwin@cygwin.com
 Subject: Re: sed converts 8-bit input text to 16-bit 
 (Unicode-16?) characters - how to suppress that?
 
   BTW please trim the redundant headers... it's really 
 considerate not to post people's email addresses in the body 
 of your post because if you do so they get harvested by spammers.

Ooops - apologies! Thought I routinely did so, but must have
overlooked it this time.
 
   What terminal/console are you using?  Unicode RXVT by any 
 chance?  How does it behave in other consoles?

THAT rang the bell! I am using PowerShell - as I just double-checked -
that indeed operates in Unicode-16 unless told otherwise. And it came
into play here, because I had simply redirected sed's output to a
file.

Nothing to do with sed at all! Apologies for the wasted bandwidth!

Michael


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

2009-03-30 Thread Christopher Faylor
On Mon, Mar 30, 2009 at 11:20:00PM +0200, Michael Moser wrote:

 -Original Message-
 From:
 [mailto:
 Sent:
 To:
 Subject:
 
   BTW please trim the redundant headers... it's really 
 considerate not to post people's email addresses in the body 
 of your post because if you do so they get harvested by spammers.

Ooops - apologies! Thought I routinely did so, but must have
overlooked it this time.

FYI, you did it above too.  None of the above information belongs in the
body of your message.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

2009-03-30 Thread Michael Moser
 -Original Message-

 
 Ooops - apologies! Thought I routinely did so, but must have 
 overlooked 
 it this time.
 
 FYI, you did it above too.  None of the above information 
 belongs in the body of your message.

What - you get someone's email addresses there? Very odd: I swear I
only see the mailing list as the one and only address in the To: field
(and no address in my Cc and Bcc fields) and I also left no email
address in those echoed headers. The only address I left in was in
From:cygwin@cygwin.com, which IMHO - if some harvester has made it
into this newsgroup - he already knows anyway...

Or do you see any further email addresses included again? If so: very
strange - how come?

Michael


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?

2009-03-30 Thread Larry Hall (Cygwin)

Michael Moser wrote:

-Original Message-


Ooops - apologies! Thought I routinely did so, but must have 
overlooked 

it this time.
FYI, you did it above too.  None of the above information 
belongs in the body of your message.


What - you get someone's email addresses there? Very odd: I swear I
only see the mailing list as the one and only address in the To: field
(and no address in my Cc and Bcc fields) and I also left no email
address in those echoed headers. The only address I left in was in
From:cygwin@cygwin.com, which IMHO - if some harvester has made it
into this newsgroup - he already knows anyway...


Take a look at http://cygwin.com/ml/cygwin/2009-03/msg01079.html and
it should be clear what Chris is talking about.  There's really no need
to post any headers at all, regardless of how commonplace you think the
information would be.  They really provide no additional information and
can be used for purposes you don't intend.  Besides being good netiquette,
this lists appreciates tidy messages as well. :-)

--
Larry Hall  http://www.rfk.com
RFK Partners, Inc.  (508) 893-9779 - RFK Office
216 Dalton Rd.  (508) 893-9889 - FAX
Holliston, MA 01746

_

A: Yes.
 Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting annoying in email?

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/