On 2008-06-10, gmarsha11 wrote: > Ok, have saved the file with Windows notepad as ANSI, Unicode, Unicode big > endian, and UTF-8. > > Both Unicode options give me the output with the extra spaces. ANSI and > UTF-8 allow me to see the files as I would expect to see them. > > Does this mean it's necessary to change the encoding for any files I might > need to cat, grep awk, etc.?
I'm no expert on any of this, but as far as I know, all traditional Unix tools that deal with strings consider a string to be a sequence of 8-bit characters. So the simple answer is yes. The more complete answer is that it depends on what you're using those files for and what other programs need to read and/or write those files. FWIW, I used Notepad on my Windows XP system to create a file containing your string, "This is abc file". When I went to save it, the Encoding was already set to ANSI. In other words, you shouldn't have to do anything special to save your files in a format already compatible with grep, etc. That being said, you really shouldn't use Notepad to edit any files you expect to use with Cygwin, because Cygwin tools expect lines to end with LF, not a CR-LF pair. Many tools will consider that CR to be part of the line. In particular, bash will give odd results if you ask it to execute a shell script written with Notepad. I got different results than you did when I cat'd abc.txt. When I saved it as Unicode, the output of cat was: ÿþThis is abc file When I saved it as Unicode Big Endian, the output of cat was: þÿThis is abc file The only difference between the two was the ordering of the bytes in the BOM (Byte Order Mark) at the beginning of each file. In both cases, there were no extra spaces. I was running bash in an rxvt window, if that matters. Regards, Gary -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/