Read and respond to this message at: https://sourceforge.net/forum/message.php?msg_id=3982047 By: keithmarshall
This is an artifact from the days of MS-DOS v1.x. In that version of MS-DOS, all disk files were stored as a collection of fixed length blocks, (of 128 bytes each IIRC). Since there was only a 1 in 128 probability that any file would *exactly* fill out the last block allocated, that block was padded with random garbage, after the last logical byte in the file, and the true logical end of file was marked by placing a 0x1A byte after it, (or maybe all the padding bytes were set to 0x1A; it was a long time ago, and my recollection is hazy). All tools designed to read *text* files were coded to recognize the first 0x1A byte read on input as being beyond the logical end of file, and would immediately return EOF, reading no more data; that convention is still honoured today, in *all* versions of Windoze, for reading from files opened in text mode. Being a text mode tool, `sed' will see the first 0x1A byte encountered as a hard end of file marker, and will not read either it, or anything beyond it. (I'm not aware of any command line option to cause `sed' to open its input stream in binary mode, and can see nothing appropriate in the `info' manual; not surprising really, for there is no text/binary distinction on *nix, whence `sed originates). FWIW, this is why Windoze command line tools use `Ctrl-Z', which generates a 0x1A byte, as the EOF signal on standard input, where *nix uses `Ctrl-D', (0x04). If all you are interested in doing is removing special characters, `tr' is a better choice than `sed' in any case; even to do a one for one transliteration, `tr' is still the better choice. If you need some extra capability, which `sed' can provide but `tr' can't, then you will have to do something like: cat infile | tr -d "\032" | sed ... to filter out the 0x1A bytes (032 in octal), before passing the residual input stream to `sed'; (of course, this means your `sed' script will never see the 0x1A bytes). BTW, the MSYS implementation of `sed' for Windoze, (see www.mingw.org), doesn't have this limitation; it has been patched to treat the input stream as *binary* data, even though it is normally text. Unfortunately, in the current MSYS-1.0.10 release, the `sed' provided is a rather dated version; it doesn't understand the '\xnn' notation, (nor even the '\0nn' octal notation), for special characters, but then, POSIX `sed' doesn't require this, so it is not a portable construct anyway. HTH, Keith. ______________________________________________________________________ You are receiving this email because you elected to monitor this forum. To stop monitoring this forum, login to SourceForge.net and visit: https://sourceforge.net/forum/unmonitor.php?forum_id=74807 ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ GnuWin32-Users mailing list GnuWin32-Users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gnuwin32-users