On Saturday, 12 May 2018 9:16:52 AM AEST Daniel Frey wrote:
> Hi all,
> 
> I am trying to do something relatively simple and I've had something
> working in the past, but my brain just doesn't want to work today.
> 
> I have a text file with the following (this is just a subset of about
> 2500 dates, and I don't want to edit these all by hand if I can avoid it):
> 
> --- START ---
> December 2, 1994
> March 27, 1992
> June 4, 1994
> 1993
> January 11, 1992
> January 3, 1995
> 
> 
> March 12, 1993
> July 12, 1991
> May 17, 1991
> August 7, 1992
> December 23, 1994
> March 27, 1992
> March 1995
> --- END ---
> 
> As you can see, there's no standard in the way the date is formatted.
> Some of them are also formatted YYYY-MM-DD and MM-DD-YYYY.
> 
> I have a basic grep that I tossed together:
> 
> grep -o '\([0-9]\{4\}\)'
> 
> This does extract the year but yields the following:
> 
> 1994
> 1992
> 1994
> 1993
> 1992
> 1995
> 1993
> 1991
> 1991
> 1992
> 1994
> 1992
> 1995
> 
> As you can see, the two empty lines are removed but this will cause
> problems with data not lining up later on.
> 
> Does anyone have a quick tip for my tired brain to make this work and
> just output a blank line if there's no match? I swear I did this months
> ago and had something working but I apparently didn't bother saving the
> script I made. Argh!
> 
> Dan


You can add an alternate regular expression that matches the blank lines, but 
the '-o' switch will still stop that match from being printed as it is an 
'empty' match. The trick is to modify the data on the fly to add a space to the 
empty lines. I have also added the '-E' switch to make the regular expression 
easier.

sed -e 's/^$/ /'  YOUR_DATA_FILE  | grep -o -E '([0-9]{4}|^[[:space:]]*$)'


-- 
Reverend Paul Colquhoun, ULC.     http://andor.dropbear.id.au/
  Asking for technical help in newsgroups?  Read this first:
     http://catb.org/~esr/faqs/smart-questions.html#intro




Reply via email to