Re: regex matching for ^M in sed

James G. Sack (jim) Sat, 24 May 2008 00:38:01 -0700

Ralph Shumaker wrote:
> I want to do this:
> cat myFile | sed -e "s/[ ]*CTRL-M/\n/g" > myFileCleaned
> where Ctrl-M is character 0x013 and \n is a newline.
> 
> I have a file that has many, many, many long lines, each with many sets
> of data, each set being separated by many spaces, each instance of which
> is ended by character 0x013, kinda like:
> data   ^Mdata   ^Mdata   ^Mdata   ^Mdata   ^M
> but the spaces before the ^M are about 77 in number (seems to be
> consistent), and the data strings are longer, containing several
> elements, each separated by either one, two, or three spaces.  If I can
> match on any number of spaces ("[ ]*") which are immediately followed by
> 0x013 (^M) and replace each instance with a newline, I'll be set (almost
> certainly).


As others have said, the ^M (0x0D, or CR "carriage return") may indicate
you have a DOS format file, with line endings  actually being a CR,LF
combination (0x0D, 0x0A).

  (You said 0x013, but I think you may have been confusing
   decimal with hex, since hex 0x0D = decimal 13.)

Of course you may have a Mac-format file which uses bare CR for line
delimiters.

You should examine a piece of the file to find out for sure. There are
several programs capable of giving (say) hex dumps -- od, hexdump, and
my favorite xxd.

 xxd -g1 -l128 file.txt
will look at the first 128 bytes and give hex (and string) output for
each single byte. If you see things like
> 0000000: 68 65 6c 6c 6f 0d 0a 77 6f 72 6c 64 0d 0a        hello..world..

The '0d 0a'   sequences confirm the DOS CR,LF format.

> 
> I think I recall \n being the equivalent of a newline, although I may be
> confusing things with my brief venture into perl.
> 
> I did man regexp, but didn't find what I wanted.  I'm not sure where
> else to look.
> 
> I'm sure that vim could probably do it, but I have already found that
> trying to search for specific things in that complexity is like looking
> for a tiny stainless steel needle in a humongous haystack.  Magnets
> won't do me any good.
> 
> I have already had dealings with sed and regexp, and figured this would
> be a good opportunity to pick up a new trick.
>

If you _want_ to use sed then man sed is the place to look. :-)

If you do have a DOS file, then perhaps you can use this:
To strip trailing space characters and the CR, you would do:
   sed -e's/ *CR//' file.old >file.new

If your file has exceptions to the CR,LF endings or if this isn't quite
what you want to do, perhaps you should explain a little more. :-)

Regards,
..jim


-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Re: regex matching for ^M in sed

Reply via email to