On Sat, May 12, 2018 at 2:16 AM, Daniel Frey <djqf...@gmail.com> wrote: > Hi all, > > I am trying to do something relatively simple and I've had something > working in the past, but my brain just doesn't want to work today. > > I have a text file with the following (this is just a subset of about > 2500 dates, and I don't want to edit these all by hand if I can avoid it): > > --- START --- > December 2, 1994 > March 27, 1992 > June 4, 1994 > 1993 > January 11, 1992 > January 3, 1995 > > > March 12, 1993 > July 12, 1991 > May 17, 1991 > August 7, 1992 > December 23, 1994 > March 27, 1992 > March 1995 > --- END --- > > As you can see, there's no standard in the way the date is formatted. > Some of them are also formatted YYYY-MM-DD and MM-DD-YYYY. > > I have a basic grep that I tossed together: > > grep -o '\([0-9]\{4\}\)' > > This does extract the year but yields the following: > > 1994 > 1992 > 1994 > 1993 > 1992 > 1995 > 1993 > 1991 > 1991 > 1992 > 1994 > 1992 > 1995 > > As you can see, the two empty lines are removed but this will cause > problems with data not lining up later on. > > Does anyone have a quick tip for my tired brain to make this work and > just output a blank line if there's no match? I swear I did this months > ago and had something working but I apparently didn't bother saving the > script I made. Argh! > > Dan >
Here's an awk and sed scripts for you to try: cat dates December 2, 1994 March 27, 1992 June 4, 1994 1993 January 11, 1992 January 3, 1995 March 12, 1993 July 12, 1991 May 17, 1991 August 7, 1992 December 23, 1994 March 27, 1992 March 1995 2018-05-12 05-12-2018 awk 'match($0,/[0-9][0-9][0-9][0-9]/){ print substr($0, RSTART, RLENGTH) } /^$/ ' dates 1994 1992 1994 1993 1992 1995 1993 1991 1991 1992 1994 1992 1995 2018 2018 sed 's/.*\([0-9][0-9][0-9][0-9]\).*/\1/p /^$/p d' dates 1994 1992 1994 1993 1992 1995 1993 1991 1991 1992 1994 1992 1995 2018 2018