James G. Sack (jim) wrote:
> Ralph Shumaker wrote:
>> I want to do this:
>> cat myFile | sed -e "s/[ ]*CTRL-M/\n/g" > myFileCleaned
>> where Ctrl-M is character 0x013 and \n is a newline.
>>
>> I have a file that has many, many, many long lines, each with many sets
>> of data, each set being separated by many spaces, each instance of which
>> is ended by character 0x013, kinda like:
>> data ^Mdata ^Mdata ^Mdata ^Mdata ^M
>> but the spaces before the ^M are about 77 in number (seems to be
>> consistent), and the data strings are longer, containing several
>> elements, each separated by either one, two, or three spaces. If I can
>> match on any number of spaces ("[ ]*") which are immediately followed by
>> 0x013 (^M) and replace each instance with a newline, I'll be set (almost
>> certainly).
>
> As others have said, the ^M (0x0D, or CR "carriage return") may indicate
> you have a DOS format file, with line endings actually being a CR,LF
> combination (0x0D, 0x0A).
>
> (You said 0x013, but I think you may have been confusing
> decimal with hex, since hex 0x0D = decimal 13.)
>
> Of course you may have a Mac-format file which uses bare CR for line
> delimiters.
>
> You should examine a piece of the file to find out for sure. There are
> several programs capable of giving (say) hex dumps -- od, hexdump, and
> my favorite xxd.
>
> xxd -g1 -l128 file.txt
> will look at the first 128 bytes and give hex (and string) output for
> each single byte. If you see things like
>> 0000000: 68 65 6c 6c 6f 0d 0a 77 6f 72 6c 64 0d 0a hello..world..
>
> The '0d 0a' sequences confirm the DOS CR,LF format.
>
>> I think I recall \n being the equivalent of a newline, although I may be
>> confusing things with my brief venture into perl.
>>
>> I did man regexp, but didn't find what I wanted. I'm not sure where
>> else to look.
>>
>> I'm sure that vim could probably do it, but I have already found that
>> trying to search for specific things in that complexity is like looking
>> for a tiny stainless steel needle in a humongous haystack. Magnets
>> won't do me any good.
>>
>> I have already had dealings with sed and regexp, and figured this would
>> be a good opportunity to pick up a new trick.
>>
>
> If you _want_ to use sed then man sed is the place to look. :-)
>
> If you do have a DOS file, then perhaps you can use this:
> To strip trailing space characters and the CR, you would do:
> sed -e's/ *CR//' file.old >file.new
Sorry, that should have been
> sed -e's/ *\r//' file.old >file.new
The "\r" is an escape-sequence for CR. :-[
>
> If your file has exceptions to the CR,LF endings or if this isn't quite
> what you want to do, perhaps you should explain a little more. :-)
>
> Regards,
> ..jim
>
>
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list