On Saturday, 12 May 2018 12:05:47 PM AEST Paul Colquhoun wrote:
> On Saturday, 12 May 2018 9:16:52 AM AEST Daniel Frey wrote:
> > Hi all,
> > 
> > I am trying to do something relatively simple and I've had something
> > working in the past, but my brain just doesn't want to work today.
> > 
> > I have a text file with the following (this is just a subset of about
> > 2500 dates, and I don't want to edit these all by hand if I can avoid it):
> > 
> > --- START ---
> > December 2, 1994
> > March 27, 1992
> > June 4, 1994
> > 1993
> > January 11, 1992
> > January 3, 1995
> > 
> > 
> > March 12, 1993
> > July 12, 1991
> > May 17, 1991
> > August 7, 1992
> > December 23, 1994
> > March 27, 1992
> > March 1995
> > --- END ---
> > 
> > As you can see, there's no standard in the way the date is formatted.
> > Some of them are also formatted YYYY-MM-DD and MM-DD-YYYY.
> > 
> > I have a basic grep that I tossed together:
> > 
> > grep -o '\([0-9]\{4\}\)'
> > 
> > This does extract the year but yields the following:
> > 
> > 1994
> > 1992
> > 1994
> > 1993
> > 1992
> > 1995
> > 1993
> > 1991
> > 1991
> > 1992
> > 1994
> > 1992
> > 1995
> > 
> > As you can see, the two empty lines are removed but this will cause
> > problems with data not lining up later on.
> > 
> > Does anyone have a quick tip for my tired brain to make this work and
> > just output a blank line if there's no match? I swear I did this months
> > ago and had something working but I apparently didn't bother saving the
> > script I made. Argh!
> > 
> > Dan
> 
> You can add an alternate regular expression that matches the blank lines,
> but the '-o' switch will still stop that match from being printed as it is
> an 'empty' match. The trick is to modify the data on the fly to add a space
> to the empty lines. I have also added the '-E' switch to make the regular
> expression easier.
> 
> sed -e 's/^$/ /'  YOUR_DATA_FILE  | grep -o -E '([0-9]{4}|^[[:space:]]*$)'


If there is no other type of data in the file, just "lines with dates" & "blank 
lines", then it can be done with just the 'sed' command on it's own:

sed -e 's/.*\([0-9][0-9][0-9][0-9]\).*/\1/' YOUR_DATA_FILE


-- 
Reverend Paul Colquhoun, ULC.     http://andor.dropbear.id.au/
  Asking for technical help in newsgroups?  Read this first:
     http://catb.org/~esr/faqs/smart-questions.html#intro




Reply via email to