On Saturday, 12 May 2018 12:05:47 PM AEST Paul Colquhoun wrote: > On Saturday, 12 May 2018 9:16:52 AM AEST Daniel Frey wrote: > > Hi all, > > > > I am trying to do something relatively simple and I've had something > > working in the past, but my brain just doesn't want to work today. > > > > I have a text file with the following (this is just a subset of about > > 2500 dates, and I don't want to edit these all by hand if I can avoid it): > > > > --- START --- > > December 2, 1994 > > March 27, 1992 > > June 4, 1994 > > 1993 > > January 11, 1992 > > January 3, 1995 > > > > > > March 12, 1993 > > July 12, 1991 > > May 17, 1991 > > August 7, 1992 > > December 23, 1994 > > March 27, 1992 > > March 1995 > > --- END --- > > > > As you can see, there's no standard in the way the date is formatted. > > Some of them are also formatted YYYY-MM-DD and MM-DD-YYYY. > > > > I have a basic grep that I tossed together: > > > > grep -o '\([0-9]\{4\}\)' > > > > This does extract the year but yields the following: > > > > 1994 > > 1992 > > 1994 > > 1993 > > 1992 > > 1995 > > 1993 > > 1991 > > 1991 > > 1992 > > 1994 > > 1992 > > 1995 > > > > As you can see, the two empty lines are removed but this will cause > > problems with data not lining up later on. > > > > Does anyone have a quick tip for my tired brain to make this work and > > just output a blank line if there's no match? I swear I did this months > > ago and had something working but I apparently didn't bother saving the > > script I made. Argh! > > > > Dan > > You can add an alternate regular expression that matches the blank lines, > but the '-o' switch will still stop that match from being printed as it is > an 'empty' match. The trick is to modify the data on the fly to add a space > to the empty lines. I have also added the '-E' switch to make the regular > expression easier. > > sed -e 's/^$/ /' YOUR_DATA_FILE | grep -o -E '([0-9]{4}|^[[:space:]]*$)'
If there is no other type of data in the file, just "lines with dates" & "blank lines", then it can be done with just the 'sed' command on it's own: sed -e 's/.*\([0-9][0-9][0-9][0-9]\).*/\1/' YOUR_DATA_FILE -- Reverend Paul Colquhoun, ULC. http://andor.dropbear.id.au/ Asking for technical help in newsgroups? Read this first: http://catb.org/~esr/faqs/smart-questions.html#intro