Read and respond to this message at: 
https://sourceforge.net/forum/message.php?msg_id=3982047
By: keithmarshall

This is an artifact from the days of MS-DOS v1.x.

In that version of MS-DOS, all disk files were stored as a collection of fixed
length blocks, (of 128 bytes each IIRC).  Since there was only a 1 in 128 
probability
that any file would *exactly* fill out the last block allocated, that block
was padded with random garbage, after the last logical byte in the file, and
the true logical end of file was marked by placing a 0x1A byte after it, (or
maybe all the padding bytes were set to 0x1A; it was a long time ago, and my
recollection is hazy).

All tools designed to read *text* files were coded to recognize the first 0x1A
byte read on input as being beyond the logical end of file, and would 
immediately
return EOF, reading no more data; that convention is still honoured today, in
*all* versions of Windoze, for reading from files opened in text mode.  Being
a text mode tool, `sed' will see the first 0x1A byte encountered as a hard end
of file marker, and will not read either it, or anything beyond it.  (I'm not
aware of any command line option to cause `sed' to open its input stream in
binary mode, and can see nothing appropriate in the `info' manual; not 
surprising
really, for there is no text/binary distinction on *nix, whence `sed
originates).

FWIW, this is why Windoze command line tools use `Ctrl-Z', which generates a
0x1A byte, as the EOF signal on standard input, where *nix uses `Ctrl-D',
(0x04).

If all you are interested in doing is removing special characters, `tr' is a
better choice than `sed' in any case; even to do a one for one transliteration,
`tr' is still the better choice.  If you need some extra capability, which `sed'
can provide but `tr' can't, then you will have to do something like:

  cat infile | tr -d "\032" | sed ...

to filter out the 0x1A bytes (032 in octal), before passing the residual input
stream to `sed'; (of course, this means your `sed' script will never see the
0x1A bytes).

BTW, the MSYS implementation of `sed' for Windoze, (see www.mingw.org), doesn't
have this limitation; it has been patched to treat the input stream as *binary*
data, even though it is normally text.  Unfortunately, in the current 
MSYS-1.0.10
release, the `sed' provided is a rather dated version; it doesn't understand
the '\xnn' notation, (nor even the '\0nn' octal notation), for special 
characters,
but then, POSIX `sed' doesn't require this, so it is not a portable construct
anyway.

HTH,
Keith.

______________________________________________________________________
You are receiving this email because you elected to monitor this forum.
To stop monitoring this forum, login to SourceForge.net and visit: 
https://sourceforge.net/forum/unmonitor.php?forum_id=74807

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
GnuWin32-Users mailing list
GnuWin32-Users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gnuwin32-users

Reply via email to